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Abstract. We prove mean convergence, as N — > oo, for the multiple ergodic averages 
~k Y2n=i f 1 (Ti %) ■ . . . ■ fi(T^ e as), where pi, . . . ,pe are integer polynomials with distinct 
degrees, and Ti, . . . ,Te are commuting, invertible measure preserving transformations, acting 
on the same probability space. This establishes several cases of a conjecture of Bergelson 
and Leibman, that complement the case of linear polynomials, recently established by Tao. 
Furthermore, we show that, unlike the case of linear polynomials, for polynomials of distinct 
degrees, the corresponding characteristic factors are mixtures of inverse limits of nilsystems. 
We use this particular structure, together with some equidistribution results on nilmanifolds, 
to give an application to multiple recurrence and a corresponding one to combinatorics. 
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1. Main results, ideas in the proofs, and further directions 

1.1. Introduction and main results. A well studied and difficult problem in ergodic theory 
is the analysis of the limiting behavior of multiple ergodic averages of commuting transforma- 
tions taken along polynomial iterates. A related conjecture of Bergelson and Leibman (given 
explicitly in [5]) states the following: 

Conjecture. Let (X, X, fi) be a probability space, T\, . . . ,Tf. X — > X be commuting, invertible 
measure preserving transformations, fi,...,fi€ and p±, . . . ,p£ £ Z[t]. 
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Then the limit 

1 N 

(1) ^±^f 1 pFMx).....f t pt'tox) 

n=l 

exists in L 2 (/i). 

Special forms of the averages in ([1]) were introduced and studied by Furstenberg [19], Fursten- 
berg and Katznelson [20], and Bergelson and Leibman [8], in a depth that was sufficient for 
them to establish the theorem of Szemeredi on arithmetic progressions and its multidimensional 
and polynomial extensions respectively. 

Proving convergence of these averages turned out to be a harder problem. When all the 
transformations T%, . . . ,Tg are equal, convergence was established after a long series of interme- 
diate results; the papers [19] [11] Q2J 13 [HJ [30], [23] [33] dealt with the important case of linear 
polynomials, and using the machinery introduced in [23J, convergence for arbitrary polynomi- 
als was finally obtained in [24] except for a few cases that were treated in [28J. For general 
commuting transformations, progress has been scarcer. When all the polynomials in ([TJ are 
linear, after a series of partial results [20] UJ [29] [33] [16] that were obtained using ergodic 
theory, convergence was established in [31] using a unitary argument. Subsequently, motivated 
by ideas from [31], several other proofs of this "linear" result were found using non-standard 
analysis [32J, and then ergodic theory [2] 122] . Proofs of convergence for general polynomial 
iterates have been given only under very strong ergodicity assumptions [5] [27]. On the other 
hand, very recently, in [3] E] techniques from [2] have been refined and extended, aiming to 
eventually handle the case of general polynomial iterates. Despite such intense efforts, for gen- 
eral commuting transformations, apart from the case where all the polynomials are linear, no 
other instance of the conjecture of Bergelson and Leibman has been resolved. In this article, 
we are going to establish this conjecture when the polynomial iterates have distinct degrees: 

Theorem 1.1. Let (X,X,fi) be a probability space, T±, . . . ,Tg: X — > X be commuting, invert- 
ible measure preserving transformations, and fi, - ■ . ,fe E Suppose that the polynomials 

pi, . . . ,pi E Z[t] have distinct degrees. 
Then the limit 

(2> „i^E«rt-W w ') 

n=M 

exists in L 2 (//) . 

Unlike previous arguments in [11] [13] [31] [32] [2] |22] , where one finds ways to sidestep the 
problem of giving precise algebraic descriptions of the factor systems that control the limiting 
behavior of special cases of the averages ([2]), a distinctive feature of the proof of Theorem 1 1.1 1 is 
that we give such descriptions^ Furthermore, we did not find it advantageous to work within a 
suitable extension of our system in order to simplify our study (like the "pleasant" or "magic" 
extensions that were introduced in [2] and in [22] respectively). In this respect, our analysis is 

A key difference between the averages of /i(T™x) ■ ji{T%x) and the averages of /i(T"x) • ^(Tif x) is that 
when Ti = T2 the first one becomes "degenerate" (= averages of (/1 • f2)(T"x)), and this complicates the 
structure of the possible factors that control their limiting behavior. However, no such choice of T\,Ti makes 
the second average "degenerate". 
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more closely related to the one used to study convergence results when all the transformations 
Ti,...,T( are equal, and in fact uses this well developed single transformation theory in a 
crucial way (in some special cases our approach leads to very concise proofs, see Appendix lAl) . 
The next result gives the description of the aforementioned factors (the factors Zk,T t are defined 
in Section [272]) : 



Theorem 1.2. Let (X,X,fj,) be a probability space, Ti,...,T^: X — > X be commuting, in- 
vertible measure preserving transformations, and fi,...,fi 6 L°°((j,). Let pi, . . . ,p# £ Z[t] be 
polynomials with distinct degrees and maximum degree d. 

Then there exists k = k(d,£) £ N such that: If /j_L2jfc<p for some i £ {1, . . . ,£}, then 

N-l 



N-M->oo N — M 

n=M 



in L 2 (n). 



Factors that satisfy the aforementioned convergence property are often called characteristic 
factors. The utility of the characteristic factors obtained in Theorem 11.21 stems from the fact 
that each individual factor is a mixture of systems of algebraic origin, in particular, it is a mix- 
ture of inverse limits of nilsystems [23] (see also Theorem 12. ip . Using this algebraic description 
of the characteristic factors (in fact its consequence Proposition 13.11 is more suitable for our 
needs), and some equidistribution results on nilmanifolds, we give the following application to 
multiple recurrence: 

Theorem 1.3. Let (X,X,/j,) be a probability space and Ti,...,Tg: X — > X be commuting, 
invertible measure preserving transformations. 

Then for every choice of distinct positive integers di, . . . , dg, and every e > 0, the set 

(3) {n G N: n Tf"" 1 A n • • • D Tf^'A) > ^{A) l+l - e] 

has bounded gaps. 

If the integers are not distinct, say d\ = cfoj then the result fails. For example, one can 
take Ti = Tf, and choose the (non-ergodic) transformation T\, and the set A, so that ([3]) 
fails with any power of n(A) on the right hand side for every n £ N (Theorem 2.1 in [7J). If 
£ = 2, d\ = d2 = 1, and the joint action of the transformations T\,T2 is ergodic, then the 
result remains true up to a change of the exponent on the right hand side [TO]. But even 
under similar ergodicity assumptions, the result probably fails when 3 exponents agree no 
matter what exponent one uses on the right hand side (a conditional counterexample appears 
in Proposition 5.2 of [14|). 

It will be clear from our argument that in the statement of Theorem 11.31 we can replace the 
polynomials n dl , . . . , n de by any collection of polynomials pi,...,pg S Z[t] with zero constant 
terms that satisfy ^ de s(Pi)+ 1 |p i+1 for i = 1, ...,£ — 1. For example {n, n 3 + n 2 , n 5 + n 4 } is such 
a family. On the other hand, our argument does not work for all polynomials with distinct 
degrees (the problem is to find a replacement for Lemma 17.6 j) . but the same lower bounds 
are expected to hold for any collection of rational independent integer polynomials with zero 
constant terms. 
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Using a multidimensional version of Furstenberg's correspondence principle (see [20] or [8]) 
it is straightforward to give a combinatorial consequence of this result. We leave the routine 
details of the proof to the interested reader. 

Theorem 1.4. Let k, £ G N, A C 7L k with d(A) > 0.0 and vi, . . . , V£ be vectors in Z k . 
Then for every choice of distinct positive integers di,...,d,£, and every e > 0, the set 

(4) {neN: d(A n (A + n dl v x ) n • • • n (A + n dt v e )) > d{A) E+1 - e} 

has bounded gaps. 

1.2. Ideas in the proofs of the main results. 

1.2.1. Key ingredients. The proofs of Theorems 11.11 U~7I[ and 11.31 use several ingredients. 

Van der Corput's Lemma. We are going to use repeatedly the following variation of the classical 
elementary lemma of van der Corput. Its proof is a straightforward modification of the one 
given in [5]. 

Van der Corput's Lemma. Let {v n } ne ^ be a bounded sequence of vectors in a Hilbert space. 
Let 

_ 1 N-l 

b h = limjv— Af— >oo 



^ < v n+h ,v n > 



Suppose that 



Then 



N-M 

n=M 



1 H 

lim — V b h = 0. 

h=i 



lim 

N-M^oo 



N-l 

Vr, 



E 



N-M 

n=M 



0. 



In most applications we have bh = for every sufficiently large h, or for "almost every h" , 
meaning that the exceptional set has zero upper density. 

An approximation result. This enables us in several instances to replace sequences of the form 
(f(T n x)) n( zfq, where / is a i^^-measurable function, with fc-step nilsequences. For ergodic 
systems, this result is an easy consequence of the structure theorem for the factors Z^,t (The- 
orem [2J]). But we need a harder to establish non-ergodic version (Proposition 13. 1| ); in our 
context we cannot assume that each individual transformation is ergodic. 

Nilsequence correlation estimates. These, roughly speaking, assert that "uniform" sequences 
do not correlate with nilsequences (see for example Theorem I6.2j) . 



Equidistribution results on nilmanifolds. These will only be used in the proof of Theorem 11.31 
(see Section m]) . 



2 For a set A C we define its upper density by d(A) = limsupjv^^ |A n [-N, N] k \/(2N) k . 



ERGODIC AVERAGES OF COMMUTING TRANSFORMATIONS WITH DISTINCT DEGREE... 5 

1.2.2. Combining the key ingredients. We first prove Theorem 11.21 that provides convenient 
characteristic factors for the multiple ergodic averages in ([2|). Its proof proceeds in two steps: 
(i) In Sections [4] and [5] we use a PET-induction argument based on successive uses of van der 
Corput's Lemma to find a characteristic factor for the transformation that corresponds to the 
highest degree polynomial iterate, and (n) In Section [6] we combine step (i), with the afore- 
mentioned approximation result and nilsequence correlation estimates, to find characteristic 
factors for the other transformations as well. 

The strategy for proving Theorems 11.11 and 11.31 can be summarized as follows: In order 
to study the limit ((2j) , we first use Theorem 11.21 to reduce matters to the case where all the 
functions /j are Zj- ^-measurable for some k £ N, and then the aforementioned approximation 
result to reduce matters to establishing certain convergence or equidistribution properties on 
nilmanifolds. This last step is easy to carry out when proving Theorem 11.11 (see Section [67i|) . but 
becomes much more cumbersome when proving Theorem 11.31 We prove the equidistribution 
properties needed for Theorem 11.31 in Section [71 

1.3. Further directions. When I = 2 and p\{n) = P2(n) = n, it is known that some sort of 
commutativity assumption on the transformations 71, T2 has to be made in order for the limit 
d2|) to exist in L 2 (fi) (see [9] for examples where convergence fails when T% and T2 generate 
solvable groups of exponential growth). On the other hand, it is not clear whether a similar 
assumption is necessary when say £ = 2 and p\{n) = n, P2(n) = n 2 . In fact, it could be the 
case that for Theorems 11.11 [L2l and 11.31 no commutativity assumption at all is needed. 

Since convergence of the averages in ([2]) for ^-measurable functions can be shown for general 
families of integer polynomials (see the argument in Section 16. 5p , it follows that the averages in 
d2|) converge in L 2 (/j,) for any collection of polynomials for which the conclusion of Theorem ll.2l 
holds. We conjecture that the conclusion of Theorem 1 1 . 2 1 holds if and only if the family of poly- 
nomials pi, ■ ■ ■ ,P£ is pairwise independent, meaning, the set {l,Pi,Pj} is linearly independent 
for every i,j £ {1, . . . ,£} with i 7^ j (simple examples show that the condition is necessary). 
Furthermore, we conjecture that if the polynomials l,pi, ■ ■ ■ ,pi are linearly independent, then 
the factors /C ra t(Tj) can take the place of the factors -ZfeT, i n the hypothesis of Theorem 11.21 

In the case where all the transformations 7\, . . . , 7j are equal, the conclusion of Theorem ll.3l 
is known to hold whenever the polynomials n, n 2 , . . . , n l are replaced by any family of linearly 
independent polynomials p±, . . . ,pg, each having zero constant term [T7] (it is known that this 
independence assumption is necessary [7J). We conjecture that a similar result holds for any 
family of commuting, invertible measure preserving transformations T%, . . . ,T#. And in fact 
again, the assumption that the transformations T±, . . . ,Ti commute may be superfluous. 

In most cases where the family of polynomials p%, . . . ,p# is not pairwise independent, for 
example when pi(n) = . . . = pe(n) = n 2 , the methods of the present article do not suffice 
to study the limiting behavior of the averages ([2]) -0 It is in cases like this that working with 
some kind of "pleasant" extension (using terminology from [2]) or "magic" extension (using 
terminology form [22]) of the system may offer an essential advantage (this is indeed the case 
when all the polynomials are linear). 



There are particular (but rather exceptional) cases of non-pairwise independent families of polynomials, 
where the methods of the present paper can be easily modified and combined with the known "linear" results 
to prove convergence. One such example is when p\(n) = n, . . . ,p^_i(n) = n, and pe(n) is a polynomial with a 
sufficiently large degree (degree > 2 l makes the problem accessible to the "simple" methods of the Appendix). 
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1.4. General conventions and notation. By a system we mean a Lebesgue probability 
space (X,X,fx), endowed with a single, or several commuting, invertible measure preserving 
transformations, acting on X. 

For notational convenience, all functions are implicitly assumed to be real valued, but 
straightforward modifications of our arguments, definitions, etc, can be given for complex 
valued functions as well. 

We say that the averages of the sequence (a n ) n€ fq converge to some limit L, and we write 



if the averages of a n on any sequence of intervals whose lengths tend to infinity converge to L. 
We use similar formulations for the limsup and for limits in function spaces. 



Lastly, the following notation will be used throughout the article: N = {1,2,...}, T k = 
R k /Z k , Tf = foT, e(t) = e 27rit . 



2.1. Background in ergodic theory. Let (X, X,fi,T) be a system. 

Factors. A homomorphism from (X, X, fi, T) to a system (Y, y, u, S) is a measurable map 
7r: X' — > Y', where X' is a T-invariant subset of X and Y' is an S'-invariant subset of Y, both 
of full measure, such that fio-K^ 1 = v and S oir(x) = noT(x) for x € X' . When we have such a 
homomorphism we say that the system (Y, y, v, S) is a factor of the system (X, X, fx, T). If the 
factor map tt: X' — > Y' can be chosen to be bijective, then we say that the systems (X, X, fi, T) 
and (Y,y,v,S) are isomorphic (bijective maps on Lebesgue spaces have measurable inverses). 

A factor can be characterized (modulo isomorphism) by 7r _1 (J ; ), which is a T-invariant sub- 
d-algebra of £>, and conversely any T-invariant sub-cr-algebra of B defines a factor. By a classical 
abuse of terminology we denote by the same letter the cr-algebra y and its inverse image by 
7r. In other words, if (Y, y, u, S) is a factor of (X, X, fx, T), we think of y as a sub-c-algebra of 
X. A factor can also be characterized (modulo isomorphism) by a T-invariant subalgebra T of 
L°°(X, X, fi), in which case y is the sub-a-algebra generated by T, or equivalently, L 2 (X, y, jj) 
is the closure of J 7 in L 2 (X, X,fi). 

Inverse limits. We say that (X, X, fi, T) is an inverse limit of a sequence of factors (X, Xj,fi, T) 
if (Xj)j e ^ is an increasing sequence of T-invariant sub-cr-algebras such that VjeN^i = ^ U P 
to sets of measure zero. 

Conditional expectation. If J 7 is a T-invariant sub-cr-algebra of X and / G L l (n), we write 
~&(f\y), or ~Kn(f\y) if needed, for the conditional expectation of / with respect to y. We will 
frequently make use of the identities 



We say that a function / is orthogonal to y, and we write / _L y, when it has a zero conditional 
expectation on y. If a function / € T°°(/x) is measurable with respect to the factor y, we write 




N-l 



2. Background in ergodic theory and nilmanifolds 




f€L°°(y,»). 
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Ergodic decomposition. We write X, or T(T) if needed, for the cr-algebra {^4 G X : T _1 A = A} 
of invariant sets. A system is ergodic if all the T- invariant sets have measure either or 1. 

Let x i y fj>x he a regular version of the conditional measures with respect to the cr-algebra I. 
This means that the map x i— > fj, x is Z-measurable, and for very bounded measurable function 
/ we have 

E^(/|I)(x) = J f dfi x for //-almost every x £ X. 
Then the ergodic decomposition of // is 

M = / fi x d(J,(x). 



The measures fi x have the additional property that for //-almost every x £ X the system 
(X, X, fj, x , T) is ergodic. 

The rational Kronecker factor. For every d G N we define /C<f = I(T d ). The rational Kronecker 
factor is 

^rat = V £d- 

We write /C ra t(T), or /C rat (T, /x), when needed. This factor is spanned by the family of functions 

{/ G L°°0) :T d f = f for some d G N}, 

or, equivalently, by the family 

{/ G L°°(//) : Tf = e(o) • / for some a G Q}. 

If E M (/i| /C ra t(T, //)) = 0, then we have, for //-almost every x G X, that E^(/i| /C ra t(T, /x x )) = 
(see Lemma 3.2 in |17j). 

2.2. The seminorms ||| • fk and the factors 2^. Sections 3 and 4 of [23J contain constructions 
that associate to every ergodic system a sequence of measures, seminorms, and factors. It is the 
case that for these constructions the hypothesis of ergodicity is not needed. Most properties 
remain valid, and can be proved in exactly the same manner, for general, not necessarily ergodic 
systems. We review the definitions and results we need in the sequel. 

Let (X, X, //, T) be a system. We write \i = J /j, x dfi(x) for the ergodic decomposition of \x. 

Definition of the seminorms. For every k > 1, we define a measure /J fc l on X 2k invariant under 
T x T x • • • x T (2 k times), by 

fjW = n x T (t) H = j VxX dn(x) ; 

forfc>l, x I(TxTx ... xT) //W. 

Writing x = (xq, x\, . . . , x 2 k_ 1 ) for a point of X 2k , we define a seminorm ||| • |||fc on L°°(/x) by 

i/iu=(/ n/^)^* 1 ^ 

J j=0 

That I • \\k is a seminorm can be proved as in |23j, and also follows from the estimate ([7]) below. 
If needed, we are going to write ||| • Ifc^, or ||| • |||fc,T- 



1/2* 
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By the inductive definition of the measures /i^ we have 
(5) Ill/Ill = ||E(/|I)|b M ; 



< 6 > mt^^^^Y.if-^nt- 

n=M 

This can be considered as an alternate definition of the seminorms (assuming one first estab- 
lishes existence of the limit in ([6])). 

For functions fo, fx, ■ ■ ■ , $2 k -i 6 L°°([m), the next inequality ([23], Lemma 3.9) follows from 
the definition of the measures by a repeated use of the Cauchy-Schwarz inequality 

2 k — 1 2 fe — 1 

(7) 1/ Uf l (x t )d^(x)\< niii/iii*- 

Seminorms and ergodic decomposition. By induction, for every k G N we have 

(8) /# ] = f{li x f ] dii{x). 
Therefore, for every function / G L°°(fi) we have 

(9) imC = Jmt^dtix). 

The factors Z k . For every fc > 1, an invariant <r-algebra ^ on X is constructed exactly as in 
Section 4 of [23]. It satisfies the same property as in Lemma 4.3 of [23 

(10) for f G L°°( M ), E^/li^) = t/ and only if \\f\\ k ^ = 0. 
Equivalently, one has 

(11) L-(W) = {/ 6 L-M : // • , * = ^ eve ry , e L' W w«h W , = o}. 

In particular, if / G L°°(/Lt) is measurable with respect to -£fc_i and satisfies |||/|||fe = 0, then 
/ = 0. Therefore, 

I • Ife is a norm on L 00 (Z k _i, fj,). 
If further clarification is needed, we are going to write Z k „, or Zp. t T- If / G L°°(/i), then it 



follows from (|9J) and (|TDj) that 

(12) E M (/|^ fciM ) = if and only if E IMx (f\Z kjlM J = for //-almost every x G X 
Furthermore, if / G L°°(/i), then 

/ G L°° {Zk,ni M) ^ an d only if / G L°°(Z k:flx , (jl x ) for //-almost every 

The first implication is non-trivial to establish though, due to various measurability problems. 
We prove this in Corollary 13.31 below. 

For every i G N one has \ f\ \ t 1 \lf\h,T (see proof of Proposition 2 in [M])- Using this and 
the inductive definition of the seminorms ([!]), one sees that l/lfey* <Cfc,£ |||/|||fc+i,T- Therefore, 

(13) if fJ-L°°(Z k>T ,n) then f ^L°° {Z h _ i;T t , //) for every £ G N. 
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2.3. Nilsystems, nilsequences, the structure of Z^. A nilmanifold is a homogeneous space 
X = G/T where G is a nilpotent Lie group, and T is a discrete cocompact subgroup of G. If 
Gk+i = {e} , where Gk denotes the k-the commutator subgroup of G, we say that X is a k-step 
nilmanifold. 

A k-step nilpotent Lie group G acts on G/T by left translation where the translation by a 
fixed element a £ G is given by T a (gT) = (ag)T. By mx we denote the unique probability 
measure on X that is invariant under the action of G by left translations (called the normalized 
Haar measure), and by G/T we denote the Borel c-algebra of G/T. Fixing an element a £ G, 
we call the system (G/T,G/T,mx ,T a ) a k-step nilsystem. 

If X = G/T is a /c-step nilmanifold, a £ G, x £ X, and / £ C(X), we call the sequence 
(/(a n x)) ng N a &as«c k-step nilsequence. A k-step nilsequence, is a uniform limit of 6asfc k-step 
nilsequences. As is easily verified, the collection of /c-step nilsequences, with the topology of 
uniform convergence, forms a closed algebra. We caution the reader that in other articles the 
term /c-step nilsequence is used for what we call here basic /c-step nilsequence, and in some 
instances the function / is assumed to satisfy weaker or stronger conditions than continuity. 

The connection between the factors Z^ of a given ergodic system and nilsystems is given by 
the following structure theorem ([23j, Lemma 4.3, Definition 4.10, and Theorem 10.1): 

Theorem 2.1 (|23|). Let (X, X,fj,,T) be an ergodic system and k 6 N. 

Then the system {X,Z k , fJ-,T) is a (measure theoretic) inverse limit of k-step nilsystems. 

Remark. In fact, in [26] it is shown that, for ergodic systems, the factor {X, Z k , n,T) is (mea- 
surably) isomorphic to a topological inverse limit of ergodic /c-step nilsystems (for a definition 
see [26]). We are going to use this fact later 

2.4. Characteristic factors for linear averages. Using successive applications of van der 
Corput's lemma, the following can be proved by induction on t as in Theorem 12.1 of [23] (the 
£ = 2 case follows for example from Theorem 2.1 in |21|): 

Theorem 2.2. Let I > 2, (X, X,fi, T) be a system, fi,---,fe £ L°°(/j,), and ai,...,ag be 
distinct non-zero integers. Suppose that /j_L^_i for some i £ {1, . . . ,£}. 
Then the averages 

1 N ^ 

53 h(T^x) ■ . . . ■ MT^x) 

N=M 

converge to in L 2 (/j,). 

3. A KEY APPROXIMATION PROPERTY 

In this section we are going to prove the following key approximation result: 

Proposition 3.1. Let (X, X,fj,,T) be a system (not necessarily ergodic) and suppose that 
f £ L°°(Z k ,Li) for some k E N. 

Then for every e > there exists a function f £ L°°(fi), with L°°-norm bounded by 
such that 

A topological dynamical system is a pair (Y, S) where Y is a compact metric space and S : Y — > Y is 
a continuous transformation. If (Yi,Si)i^ is a sequence topological dynamical systems and ni : Yt+i Yi 
are factor maps, the inverse limit of the systems is defined to be the compact subset Y of PljeN ^< given by 
Y = {(yi)igjj: ^i{Vi+i) = Hi}, with the induced infinite product metric and continuous transformation T. 
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(i) feL°°(Z k ,ii) and f-f < e ; 

(ii) for fi-almost every x G X , the sequence (/(T n x)) ng N is a k-step nilsequence. 

By [28], if (a n ) nS N is a k-step nilsequence, and p G Z[t] is a polynomial of degree d > 1, 
then the sequence (a p ( n )) ng N is a (d/j)-step nilsequence. Therefore, the function / given by the 
proposition satisfies: 

(hi) for fi-almost every x G X and every polynomial p G Z[i] o/ degree d > 1, i/ie sequence 
(f(T p ^x)) ni _ N is a (dk)-step nilsequence. 

If the system (X, X,/i,T) is ergodic, then one can deduce Proposition 13,11 from Theorem 12. II 
in a straightforward way. It turns out to be much harder to prove this result in the non- 
ergodic case (and this strengthening is crucial for our later applications), due to a non-trivial 
measurable selection problem that one has to overcome. We give the proof in the following 
subsections. 

3.1. Dual functions. In this subsection, (X, X , fi,T) is a system, and the ergodic decompo- 
sition of ix is yu = J ^L x dn(x). We remind the reader that we work with real valued functions 
only. 

We define a family of functions that will be used in the proof of Proposition 13.11 and gather 
some basic properties they satisfy. 

When / is a bounded measurable function on X, for every N G N, we write 



■lv(/) E II /(T" iei -'-"'".r) 



N k 

l<n 1 ,...,n k <N ee{0,l} fc , 
e^00---0 



It is known by Theorem 1.2 in [23] that the averages A^{f) converge in L 2 (fi) (in fact by [T] 
they converge pointwise but we do not need this strengthening), and we define 

V k f = lim A N (f) 

N^oo 

where the limit is taken in L 2 (fi). If needed, we write T> kil f. The function T> k f satisfies 
Theorem 13.1): For every g G L°°(p), we have 

2 k — 1 

(14) fg-V k fdfi = ! g(x ) ]J /(ar<) dp® (x) 

^ ^ i=l 

where x = (xq,xi, . . . ,x 2 fc_i) G X 2 * . In particular, by the definition of |||/|||a:, we have 

(15) / /■P fc /^ = |||/|||i\ 



and by inequality (|7|), for every function g G L°°(n) we have 



g ■ V k fdfi 



< Wdik ■ mfm2 "~ 1 



(16) 

Example 1. We have 

1 N 



ni=l 
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Also, 



V 2 f(x) = lim —r 




>n i+7i2 



X). 



N^oc N 2 ^ 

l<ni,n,2<N 



If T is an ergodic rotation on the circle T with the Haar measure mj, then an easy computation 
gives 

(V 2 f)(x)= [ [ f(x + s)- f(x + t)- f(x + s + t)dmj(s)dmj(t). 
J j J J 

Notice that in this case the function T>2f{x) may be non-constant, and no matter whether the 
function / is continuous or not, the function T>2f(x) is continuous on T.123 

We gather some additional basic properties of dual functions. 

Proposition 3.2. Let (X, X, fi, T) be a system. 

Then for every f £ L°°(/x) and k £ N the following hold: 

(i) For fi-almost every x £ X, we have V^^f = T>k t ^, x f as functions of L°°(n x ). 

(ii) The function T>kf is Zk-i-measurable, in fact T>kf = T>kf where f = E(/|.Z^_i). 
(hi) Linear combinations of functions T>\.f with f £ L°°(/x) are dense in L 1 (Zk~i, jtt). 

Proof. We show (JTJ) - The averages Ajy(f) converge to T>k^f in L 2 (//). Therefore, there exists 
a subsequence of Ajy(f) that converges to V^^f almost everywhere with respect to \x. As a 
consequence, for ^-almost every x € X, this subsequence converges to T>k,uf almost everywhere 
with respect to \i x . On the other hand, by the definition of T>k t n x f, for /i-almost every x £ X 
this subsequence also converges to T>k^ x f in L 2 (fi x ). The result follows. 

We show (|n|). Since the operation maps L°°(Zk-i, fi) to itself, it suffices to establish 
the second claim. Let g £ L°°(/x). Using (fTi|) and expanding / as / + (/ — /), we see that 
/ 9 'T^kf dfJ, is equal to f g ■ V^f dfi, plus integrals of the form 



where each of the functions fi is equal to either / or to / — /, and at least one of the functions fi 
is equal to / — /. Since E(/ — f\Zk-i) = 0, by (fTU}) we have |||/ — f\\k = 0, and by inequality (J7]), 
all these integrals are equal to zero. This establishes that J g ■ T>kf d\x is equal to J g ■ T>kf dfi, 
and the announced result follows. 

We show (jmj). By duality, it suffices to show that if g £ L°° {Zk-i, fi) satisfies f g-V^f dfi = 
for every / £ L°°(/z), then g = 0. Taking f = g gives J g ■ V^g dfi = 0, and using (fT5|) we get 
\\g\\k = 0. Since ||| • |||fc is a norm in L°°(Zk_i, fi) we deduce that g = 0. This completes the 
proof. □ 

Corollary 3.3. Let (X, X , fi,T) be a system and f £ L°°(Zk t u, fi) for some k £ N. 
Then, for fi-almost every x £ X, we have f £ L°°(Zk lf i x , fi x )- 

Proof. By part (fulj) of Propositition 13.21 there exists a sequence (/ n )neNi of finite linear combi- 
nations of functions of the form T>k+i(f> where <fi £ L°°(fi), such that f n — > f in L l (fi). Passing 
to a subsequence, we can assume that / n — >• / almost everywhere with respect to fi. As a 
consequence, for /^-almost every x £ X, we have f n — > f almost everywhere with respect to fi x . 




8=1 
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Furthermore, by parts ((J) and (jn]) of Propositition 13.21 we have that f n G L°°(Zk^ x , (j, x ) for 
jU-almost every x and every n G N. The announced result follows. □ 

3.2. Proof of Proposition 13.11 In order to prove Proposition 13.11 we are going to make 
use of two ingredients. The first is a continuity property of dual functions (it follows from 
Proposition 5.2 and Lemma 5.8 in [26]): 

Theorem 3.4 (|26j). Suppose that the topological dynamical system (Y,S) is a topological 
inverse limit of minimal (k — l)-step nilsystems, and let my be the unique S -invariant measure 
in Y . 

Then for every f G L co (my), and k G N, the function T>^f coincides my-almost everywhere 
with a continuous function in Y . 

Let (X, X , p,T) be an ergodic system and let (Yk, 3^, mk, Sk) be a topological inverse limit 
of minimal nilsystems that is measure theoretically isomorphic to the factor (X, Zk,[i,T) (see 
the remark following Theorem 12. ip . With TTk-X — > Yk we denote the measure preserving 
isomorphism that identifies (X, Zk, fi,T) with (Yk, 34> ink, Sk)- Using this notation we have: 

Corollary 3.5. Let (X, X , fi,T) be an ergodic system, f G L°°(fj,), and k G N. 

Then there exists g G C(Y"k-l) such that T>kf coincides [i-almost everywhere with the function 

Proof. By part ([n]) of Proposition 13.21 we have that T>kf = T>kf where / = E(/|.Z^_i). There- 
fore, we can assume that / G L°°(Zk-i, pi). Writing / = <f> o i^k-x for some 4> G L°°(mk-x), we 
have T>kf = i^Pk^) n k-i- The announced result now follows from Theorem 13.41 □ 

The second ingredient is Theorem 1.1 of [26], which gives a characterization of nilsequences 
that uses only local information about the sequence. To give here the exact statement would 
necessitate to introduce definitions and notation that we are not going to use in the sequel, so 
we choose to only state an immediate consequence that we need. 

Theorem 3.6 ([26 ). Let (a s (n)) rag pj be a collection of sequences indexed by a set S. 

Then for every k G N the set of s G S for which the sequence (a s (n)) n( =N is a k-step nilse- 
quence belongs to the a -algebra spanned by sets of the form A^ m ^ n = {s G S : \a s (m) — a s (n)\ < 
1/1} , where l,m,n G N. 

Using this, we immediately deduce the following measurability property: 

Corollary 3.7. Let (X, X, fi, T) be a system, f G L°°(n), and k G N. 

Then the set A f = {x G X : (f(T n x)) n^fq is a k-step nilsequence} is measurable. 

We are now ready for the proof of Proposition 13.11 

Proof o f Proposition HOI First notice that if a function / G L°°(fjL) satisfies properties (i) and 
(ii), then the function g = min(|/|, H/H^ao^,)) - sign(/) has L°°-norm bounded by H/H^oo^) and 
still satisfies properties (i) and (ii) (we used here that min(|a n |,M) • sign(a ra ) is a nilsequence 
if a n is). So it suffices to find / G L°°(/j,) that satisfies properties (i) and (ii). 

Since, by part (Jm} of Proposition 13.21 for every k G N, linear combinations of functions of 
the form T>k+i^4> with 4> G L°°(fi) are dense in L 1 (Zk i ^, /u), we can assume that / is of the 
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form T>k+i tfJ ,4> for some <f> G L°°(//). Hence, it suffices to show that for every <fi G L°°(fj,) we 
have M(^2?*+i lft ^) = 1) where 

^k+i,^ = {x £ X: ({V k+ i ylJl 4>)(T n x)) n& i is a fc-step nilsequence} . 

Let n = f fJ-x dfj,(x) be the ergodic decomposition of the measure //. Since by Corollarv l3.7l the 
set Aj) k+1 $ is //-measurable, it suffices to show that [i x (A-p k+l ^) = 1 for //-almost every x G X. 
By part (ji|) of Proposition 13.21 we have for //-almost every x £ X that "Dfe+l,^ = T^k+i,ti x 4> as 
functions of L°°(// x ). As a consequence, it remains to show that fj, x (Ax> k+1 x ^>) = 1 for //-almost 
every x £ X. 

We have therefore reduced matters to establishing that fi(Ax> k+1 <f>) = 1 for ergodic systems 
and <j) £ L 00 ^). Using Corollary 13.51 and the notation introduced there, we get that there 
exists a function g G C(Y/%) such that for //-almost every x £ X we have 

CPfc+l.M 0)0*0 = g(-K k x). 

As a consequence, for //-almost every x G X, we have 

(ZVn^XT"*) = 0(SE7r fc z) for every n G N. 

Since (Y k ,Sk) is a topological inverse limit of nilsystems and g G C(Y k ), for every y G Yfe the 
sequence (g(S^y)) n ^ is a fc-step nilsequence. We conclude that indeed /j,(Ad = 1. This 
completes the proof. □ 

4. A CHARACTERISTIC FACTOR FOR THE HIGHEST DEGREE ITERATE: TWO 

TRANSFORMATIONS 

In this section and the next one, we are going to prove Theorem 11.21 under the additional 
assumption that the function corresponding to the highest degree polynomial iterate satisfies 
the stated orthogonality assumption. For example, if deg(pi) > deg(pj) for i = 2, ...,£, we 
assume that /i-LZ^Tj for some k G N. 

In fact our method necessitates that we prove a more general result (Proposition 15. 1( 1. This 
result is also going to be used in Section [6j when we deal with the polynomials of lower degree. 

However, since the proof is notationally heavy, we present it first in the case of two commuting 
transformations. In the next section we give a sketch of the proof for the general case, focusing 
on the few points where the differences are significant. 

In this section, we show: 

Proposition 4.1. Let (X, X, /i, Ti, T2) be a system and fi, - ■ ■ ,f m G L°°(/i). Let (V, Q) be a 

nice ordered family of pairs of polynomials, with degree d (all notions are defined in Section [4-ty - 
Then there exists k = k(d,m) G N such that: Lf f\LZ k ^ x , then the averages 

N-l 

(17) Y, h(T^ n) T^ n) x) ..... f m {T^ {n) T^ n) x) 

n=M 

converge to in L 2 (/t). 

Applying this to the nice family (V, Q) where V = (pi,0) and Q = (0,^2)1 we get: 
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Corollary 4.2. Let (X, X, /i, Ti, T 2 ) be a system and /i,/2 £ L°°(n). Let p± and P2 be integer 
polynomials with d = deg(pi) > deg(p 2 ). 

T/ien there exists k = k(d) such that, if fi _L Zk,T x > then the averages 

N-l 

1 f (rpPl{n) N f (rpP2(n) N 

n=M 

converge to in L 2 (/i). 

4.1. A simple example. We give here a very simple example in order to explain our strategy. 
In the appendix we consider slightly more general averages and get more precise results (the 
main drawback of these simpler arguments is that they do not allow us to treat any two 
polynomials with distinct degrees). Let (X, X, fj,, T\, T 2 ) be a system and /i,/2 £ L°°(/x). 

Claim. If f\ _L Z^t^ , then the averages 

1 N ^ 

(is) -^E^'^M 

n=M 

converge to in L 2 (fi). 

Using van der Corput's Lemma it suffices to show that for every hi £ N, the averages in n 

of 

j h(Tfx) ■ f 2 (T?x) ■ h(T$ n+h ^x) ■ f 2 (T^x) d^x) 

converge to 0. After composing with r 2 _n and using the Cauchy-Schwarz inequality, we reduce 
matters to showing that the averages in n of 

h(TfT 2 - n x) ■ h(Ti n+hl)2 T 2 - n x) 

converge to in L 2 (fi). Using van der Corput's Lemma one more time, we reduce matters to 
showing that for every fixed h\ £ N, for every large enough /12 £ N, the averages in n of 

J h(TfT 2 - n x) ■ f x {T[ n+hl? T 2 - n x) ■ h{T[ n+h2? T 2 n - h2 x) ■ f 1 (r]; n+hl+h * ) ' ''T 2 n ~ h2 x) d^{x) 
converge to 0, or equivalently, that the averages in n of 

(19) J h(x) ■ h{Tl nhl+hl x) ■ h{Tl nMl T 2 - h *x) ■ f 1 {Tl n[hl+h2)+[hl+h2)2 T 2 h ^x)d^x) 
converge to 0. 

The important property of this last average is that it involves only constant iterates of 
the transformation T2 (for hi,h 2 fixed). Therefore, we can apply the known results about 
the convergence of averages of a single transformation. It follows from Theorem 12.21 that 
the averages in n of (|19p converge to for all hi,h 2 £ N such that the linear polynomials 
2h±n, 2h 2 n, 2{h\ + h 2 )n are distinct, that is, for all h\,h 2 £ N with h\ 7^ h 2 . The claim follows. 

We will come back to this example in Section 16.11 

4.2. Families of pairs and their type. In this subsection we follow [8] with some changes 
on the notation, in order to define the type of a family of pairs of polynomials. 
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4.2.1. Families of pairs of polynomials. Let m G N. Given two ordered families of polynomials 

V = (pi,...,p m ), Q= (qi,...,q m ) 

we define the ordered family of pairs of polynomials (V, Q) as follows 

(V, Q) = ((pi,gi), • • • , (p m ,q m )). 

The reader is advised to think of this family as an efficient way to record the polynomial iterates 
that appear in ([TTj) . 

The maximum of the degrees of the polynomials in the families V and Q is called the degree 
of the family (V, Q). 

For convenience of exposition, if pairs of constant polynomials appear in (V, Q) we remove 
them, and henceforth we assume: 

All families (V, Q) that we consider do not contain pairs of constant polynomials. 

4.2.2. Definition of type. We fix an integer d > 1 and restrict ourselves to families (V, Q) of 
degree < d. 

We say that two polynomials p,q G Z[t] are equivalent, and write p ~ q, if they have the 
same degree and the same leading coefficient. Equivalently, p ~ q if and only if deg(p — q) < 
min{deg(p), deg(g)} 

We define Q' to be the following set (possibly empty) 

Q' = {ft £ Q: Pi is constant}. 

For i = 1, . . . , d, let iui i, w>2,i be the number of distinct non-equivalent classes of polynomials 
of degree i in V and Q' correspondingly. 

We define the (matrix) type of the family (V, Q) to be the 2 x d matrix 

(wi )d ... WlA 

\W2,d ■ ■ ■ W2,l J ' 

If Q' is empty, then all the elements of the second row are taken to be 0. For example, with 
d = 4, the family 

((n 2 ,n 4 ), (n 2 +n,n), (2n 2 ,2n), (0,n 3 ), (0,n)) 

has type 

fO 2 0\ 
V^O 1 ly ' 

We order the types lexicographically; we start by comparing the first element of the first row of 
each matrix, and after going through all the elements of the first row, we compare the elements 
of the second row of each matrix, and so on. In symbols: given two 2 x d matrices W = (wij) 
and W = (w'i •), we say that W > W if: w\ : d > w[ d , or w\ jt i = w[ d and w^d-i > w[ . . ., 
or wi t % = w[ i for i = 1, . . . , d and 11)2^ > w' 2 d , and so on. 
For example 

(0 0) > (! i) > (! !) > (i *) > (! *) - (°°) - (0 !) ^ (S S) • 

where in the place of the stars one can put any collection of non- negative integers. 

An important observation is that although for a given type W there is an infinite number of 
possible types W' < W, we have 
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Lemma 4.3. Every decreasing sequence of types of families of polynomial pairs is stationary. 

Therefore, if some operation reduces the type, then after a finite number of repetitions it is 
going to terminate. This is the basic principle behind all the PET induction arguments used 
in the literature and in this article. 

4.3. Nice families and the van der Corput operation. In this subsection we define a 
class of families of pairs of polynomials that we are going to work with in the sequel, and an 
important operation that preserves such families and reduces their type. 

4.3.1. Nice families. Let V = (pi, . . . ,p m ) and Q = (q\, . . . , q m ). 

Definition. We call the ordered family of pairs of polynomials (V, Q) nice if 

(i) degOi) > deg(pi) for i = 1, . . . , m ; 

(ii) deg(pi) > deg(<7i) for i = 1, . . . , m ; 

(hi) deg(j?i - pi) > deg(gi - for i = 2, . . . , m. 
(Notice that a consequence of dm} is that p\ — p%i^ const for i = 2, . . . , m.) 

As an example, if a nice family consists of m pairs of polynomials and has degree 1, then 
we have: deg(pi) = 1, deg(pj) < 1, deg(gj) = for i = 1, . . . , m, and deg(pi — pi) = 1 for 
i = 2, . . . , m. It follows that the type of this family is 

<*> (S ::: S 5) 

for some k G N with k < to. 

4.3.2. The van der Corput operation. Given a family V = (pi, . . . ,Pm), P £ ^[t], and h £ N, 
we define 

ShP = (pi(n + h),... ,p m (n + h)) and V - p = (pi - p, . . . ,p m -p). 

Given a family of pairs of polynomials (V, Q), a pair (p, q) £ (V, Q), and ft 6 N, we define the 
following operation 

(p,q,h)-vdC(V,Q) = (P,QY 

where 

P = (S h V - p,V - p), Q = (S h Q-q,Q-q), 

and * is the operation that removes all pairs of constant polynomials from a given family of 
pairs of polynomials. A more explicit form of the family (p, q, h) -vdC("P, Q) is 

{{ShPi ~ P,S h qi - <?),..., [S h pm - P,S h q m - q), (pi - p, qx - q), . . . , (p m - p,q m - q))* . 

Notice that if the family (V, Q) has degree d and contains m pairs of polynomials, then for 
every h G N, the family (p, q, h) -vdC(V, Q) has degree at most d and contains at most 2m 
pairs of polynomials. 
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4.4. An example. In order to explain our method we give an example that is somewhat more 
complicated than the example of Section 14.11 When we study the limiting behavior of the 
averages 



n=M 

we define V = (n 3 ,0), Q = (0, n 2 ), and introduce the family of pairs of polynomials 

(V,Q) = ((n 3 ,0),(0,n 2 )). 

This family is nice and has type ( q ? [] )• Applying the vdC operation with (p, q) = (0, n 2 ) and 
ft € N, we arrive to the new family 

(0,n 2 ,ft)-vdC(P,Q) = (Ph,Qh) 

where 

P h = ((n + ft) 3 , 0, ra 3 , 0), Q h = (~n 2 , 2hn + ft 2 , -n 2 , 0) ; 
then the corresponding family of pairs is 

( ((n + hf, -n 2 ) , (0, 2ftn + h 2 ) , (ra 3 , -n 2 ) ) . 

The important point is that for every ft G N this new family is also nice and has smaller type, 
namely ( o o l ) ■ Translating back to ergodic theory, we get the averages 

N-l 

_L_ £ ; (T (n +") 8 r -« 2 x) • g{n hn+h2 x) ■ KTfT^x) 

n=M 

for some choice of functions f,g,h £ L°°(/i). Concerning the choice of these functions, the only 
important thing for our purposes is that f = fi- 

4.5. The general strategy. As was the case in the previous example, we are going to show 
that if we are given a nice family (V, Q) with deg(px) > 2, then it is always possible to find 
appropriate (p, q) 6 (V, Q) so that for all large enough h 6 N the operation (p, g, ft) -vdC 
leads to a nice family that has smaller type. Our objective is, after successively applying the 
operation (p, q, ft)-vdC, to finally get nice families of degree 1, and thus with matrix type of 
the form (12TJD . 

Translating this back to ergodic theory, we get multiple ergodic averages (with certain pa- 
rameters) where: (i) only linear iterates of the transformation T\ appear and the iterates of Ti 
are constant, and (ii) the "first" iterate of T\ is applied to the "first" function of the original 
average. The advantage now is that the limiting behavior of such averages can be treated easily 
using the well developed theory of multiple ergodic averages involving a single transformation. 

Let us remark though that in practice this process becomes cumbersome very quickly. For 
instance, in the example of Section 14^41 for every ft G N, the next (ph, qh, hf) -vdC operation uses 
{PhiQh) = (0, 2ftn + ft 2 ) and leads to a family with matrix type (q oo)- The subsequent vdC 
operation leads to a family with matrix type (§00) ■ ^ ne then has to apply the vdC operation 
a huge number of times (it is not even easy to estimate this number) in order to reduce the 
matrix type to the form (120p . So even in the case of two commuting transformations, it is 
practically impossible to spell out the details of how this process works when both polynomial 
iterates are non-linear. 
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4.6. Choosing a good vdC operation. The next lemma is the key ingredient used to carry 
out the previous plan. To prove it we are going to use freely the following easy to prove fact: 
If p, q are two non-constant polynomials and p ~ q, then deg(p — q) < deg(p) — 1, and with the 
possible exception of one liGZwe have deg(ShP — q) = deg(p) — 1. 

Lemma 4.4. Let (V, Q) be a nice family of pairs of polynomials, and suppose that deg(pi) > 2. 

Then there exists (p,q) 6 (?, Q), such that for every large enough h G N, the family 
(p, q, h) -vdC(V, Q) is nice and has strictly smaller type than that of (V, Q). 

Proof. Let V = (pi, . . . ,p m ), Q = (qi, . . . , q m ), then for (p, q) G (V, Q) and h € N the family 
(p, q, h) -vdC(V, Q) is an ordered family of pairs of polynomials, all of them of the form 

(S h pi-p,S h qi-q), or (pi-p,%- q). 

We choose (p, q) as follows: If Q' is non-empty, then we take p = and let q to be a polynomial 
of smallest degree in Q'. Then the first row of the matrix type remains unchanged, and the 
second row will get "reduced" , leading to a smaller matrix type. Suppose now that Q' is empty. 
If V consists of a single polynomial p%, then we choose (p,q) = (pi,<Zi) and the result follows. 
Therefore, we can assume that V contains a polynomial other than p\. We consider two cases. 
If p ~ pi for all p G V, then we choose (p, q) = (pi,qi)- Otherwise, we choose (p, q) G (V, Q) 
such that p oo p 1 and p is a polynomial in V with minimal degree (such a choice exists since p\ 
has the highest degree in V). 

In all cases, for every h £ N, the first row of the matrix type of (p,q,h)-vdC(V,Q) is 
"smaller" than that of (V, Q), and as a consequence the new family has strictly smaller type. 

It remains to verify that for every large enough h 6 N the ordered family of pairs of polyno- 
mials (p, q, h) -vdC(V, Q) is nice. We remark that, by construction, the first polynomial pair 
in this family is (S h pi - p, S h qi - q). 

Claim. Property (Ji|) holds for every h € N. 

Equivalently, we claim that 

deg(S h pi -p)> max{degOi - p), deg(S h pi - p)} for i = 1, . . . , m. 

If p oo then deg(ShPi — p) = deg(px) an d the claim follows from our assumption deg(pi) > 
deg(pi) for i = 1, ... ,m. If p ~ pi, then by the choice of the polynomial p we have p = p\ 
and p ~ Pi for i = 1, . . . , m. As a result, deg(ShPi — p) = deg(pi) — 1 and max{deg(pj — 
p),deg(ShPi — p)} < deg(pi) — 1, proving the claim. 

Claim. Property (Jn]) holds for every h G N. 

Equivalently, we claim that 

deg(S h pi -p)> max{deg(% - q), deg(S h qi - q)} for i = 1, . . . , m. 

\{ p no pi, then deg(£Vj?i — p) = deg(pi) and the claim follows since by assumption deg(pi) > 
deg(gj) for i = 1, . . . , m. If p ~ pi, then by the choice of p we have (p, q) = (pi, gi) and p ~ Pi 
for i = 1, . . . , m. By hypothesis we have 

(21) deg(% - q{) < deg(p, - pi) < deg(pi) - 1 = deg(S'/ l pi - pi). 

It remains to verify that deg(ShPi — Pi) > deg(Shqi — qi)- To see this we express Shqi — Qi as 
(ShQi — Qi) + (Qi — Qi)- If Qi is non-constant, then the first polynomial has degree deg((/j) — 1 < 



ERGODIC AVERAGES OF COMMUTING TRANSFORMATIONS WITH DISTINCT DEGREE. 



19 



deg(pi)— 1 = deg(ShPi-pi)- If qi is constant, then it has degree < deg(pi) — 1 = deg(ShPi-pi) 
(we used here that deg(pi) > 2). Furthermore, by (f2~TI) the second polynomial has degree 
deg(qi — q±) < deg(ShPi — Pi). This proves the claim. 

Claim. Property ([m|) holds for all except finitely many values of h. 

Equivalently, we claim that 

deg(S h pi - S h pi) > deg(S h q! - Shqi), for i = 2, . . . , m, 

and 

deg(S h p! - pi) > deg(S h qi - qi), for i = 1, . . . , m. 
The first estimate follows immediately from our hypothesis deg(pi — pi) > deg(qi — qi) for 
i = 2, . . . , to. It remains to verify the second estimate. If pi ^ pi, then deg(ShPi — Pi) = 
deg(pi) and the claim follows since by hypothesis deg(pi) > deg(qi) for i = 1, . . . , m. Suppose 
now that pi ~ p\. Then deg(ShPi — Pi) = deg(pi) — 1, with the possible exception of one 
h G N (hence we get at most m — 1 exceptional values of h). So it remains to verify that 
deg(5/ l gi - qi) < deg(pi) - 1. To see this we express S h q\ - qi as (S h qi - qi) + (<?i - qi)- 
The first polynomial has degree deg(gi) — 1 < deg(pi) — 1 if q\ is non-constant, and degree 
< deg(pi) — 1 (we used that deg(pi) > 2) if q\ is constant. The second polynomial has degree 
deg(gi — qi) < deg(pi —pi) < deg(pi) — 1 since pi ~ p\. This establishes the claim and completes 
the proof. □ 

We say that a subset of N k is good if it is of the form 

(22) {/ii > ci, h 2 > c 2 (hi), h k > c k (hi, hfc-i)} 

for some Cj : N i_1 — > N. The next lemma will be used in order to prove that the level k of the 
characteristic factors Zj,^ considered in Theorem 11.21 depends only on the number and the 
maximum degree of the polynomials involved. 

Lemma 4.5. Let (V, Q) be a nice family with degree d>2 that contains m pairs of polynomials. 
Suppose that we successively apply the (p,q,h)-vdC operation for appropriate choices ofp,q 6 
Z[t] and h G N, as described in the previous lemma, each time getting a nice family of pairs of 
polynomials with strictly smaller matrix type. 

Then after a finite number of operations we get, for a good set of parameters, nice families 
of pairs of polynomials of degree 1 . Moreover, the number of operations needed can be bounded 
by a function of d and m alone. 

Remark. The exact dependency on d and m seems neither easy nor very useful to pin down; it 
appears to be a tower of exponentials the length of which depends on d and m. 

Proof. We fix d > 2. The first statement follows immediately from Lemma 14.31 

We denote by W(V, Q) the matrix type of a given family (V, Q), and by N(V, Q) the number 

of operations mentioned in the statement needed to get the particular matrix type. 

First we claim that it suffices to show the following: For every nice family (V, Q) with degree 

d, containing at most m polynomials, we have 

(23) N(V,Q)<f(W(V,Q),m) 

for some function /, with the obvious domain, and range in the non-negative integers. Indeed, 
since there exists a finite number of possible matrix types for a family (V, Q) with degree at 
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most d and containing at most m polynomials (in fact there are at most (m + l) 2d such matrix 
types), we have 

N(V, Q) < F(d,m) = max(/(W,m)) 

w 

where W ranges over all possible matrix types of nice families with degree d that contain at 
most m pairs of polynomials. This proves our claim. 
Next we turn our attention to establishing (|23|) . Let 

Wl = (° ° 

Notice that W\ is the smallest matrix type (with respect to the order introduced before) that 
can appear as a first coordinate entry in the domain of /. We define / recursively as follows: 

(24) f(Wi,m) = for every m G N, and f(W, m) = max (f(W, 2m)) + 1 

W'<w 

where the maximum is taken over the finitely many possible matrix types W of nice families 
of degree at most d that contain at most 2m pairs of polynomials. 

Since every (p, q, h)-vdC operation of the previous lemma preserves nice families of pairs of 
polynomials, does not increase their degree, reduces their matrix type, and at most doubles the 
number m of (non-constant) pairs of polynomials in the family, a straightforward induction on 
the type W(V,Q) establishes (|23l) with / defined by ([241 . This completes the proof. □ 

4.7. Proof of Proposition 14. 1L Let (V, Q) be a nice family of pairs of polynomials where 
V = (pi, . . . ,p m ) and Q = (gi, . . . , g m ) and let d be the degree of this family. We remind the 
reader that our goal is to show that there exists k = k(d,m) G N such that: If /i-LZj^, then 
the averages of 

(25) /i(2f l(n) 2f (n) or) • . . . • f m (T? m[n) T* m{n) x) 
converge to in L 2 (n). 

(a) Suppose first that deg(pi) = 1. Since the family (V, Q) is nice, we have deg(pj) = 1 for 
i = 1, . . . , m, all the polynomials gi, . . . , q m are constant, and p\ — pi / const for i = 1, . . . , m. 
In other words we are reduced to studying the limiting behavior of the averages in n of 

h{T^ n+hl T^x) ■ f 2 (T^ n+b2 T^ 2 x) • . . . • f m (T^ mn+bm T^x) 

where ai,bi,Ci G Z, a» ^ 0, for i = l,...,m, and a\ ^ for i = 2,...,m. Suppose that 
/il^i^, then also T2/i_LZ m _i i T , 1 (since T\ and T 2 commute). By Theorem l2.2l the previous 
averages converge to in L 2 (fi), and as a consequence the same holds for the averages of (j25l) . 

(b) Suppose now that deg(pi) > 2. Our objective is to repeatedly use van der Corput's Lemma 
in order to reduce matters to the previously established linear case. 

To begin with, using van der Corput's Lemma we see that in order to establish convergence 
to for the averages of (|25j) . it suffices to show that, for every sufficiently large h G N, the 
averages in n of 

/f i rpPi(n+h) rpqi{n+h) x f / rpPm(n+h) rpq m {n+h) x 

/H J 1 1 2 X ) ' ■ ■ ■ ' Jmi-Li ±2 X )' 

/i(7f lW lf (n) i) • . . . • / m (Tf m(n) T 2 9m(n) x) dn 
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converge to 0. We compose with T x p ( n >T 2 q ^ n \ where (p, q) G (V, Q) is chosen as in Lemma 
and use the Cauchy-Schwarz inequality. This reduces matters to showing that, for every suffi- 
ciently large h G N, the averages in n of 

(26) j^rpPi{n+h)-p(n)rpqi{n+h)-q{n) ^ _ _ j. ^pp m (n+h)-p(n)j,q m (n+h)-q(n) 

^frpPi(n)-p(n)rpq\{n)-q(n) ^ _ _ j. ^p m {n)-p{n)rpq m (n)-q{n) ^ 

converge to in L 2 Gu). We remove the functions that happen to be composed with constant 
iterates of T and S, since they do not affect convergence to 0. This corresponds to the operation 
* defined in Section 14.3,21 We get multiple ergodic averages that correspond to the families of 
polynomials (p, q, h) -vdC('P, Q); our goal is to show convergence to in L 2 (ji) for every large 
enough h G N . 

By Lemma 14,41 f° r every large enough h G N, the family (p, q, h) -vdC('P, Q) is nice, and 
its first pair is (pi(n + h) — p(n),qi(n + h) — q{n)). Notice also that, in (j26j) the iterate 
rppi(n+h) p(n) rpqi{n+h) q(n) . g a ppjj ec i ^ f unc tion f\. We consider two cases depending on 
the degree of the polynomial pi(n + h) — p(n). 

(b>i) If deg(pi(n + h) — p(n)) = 1, then we are reduced to the case (a) studied before. As we 
explained, if /i_l_i?2m,Ti, then the averages ([26]) converge to in L 2 (/j,) for every large enough 
h G N. As a consequence, the averages (|25j) converge to in L 2 ([i). 

(b2) If deg(pi(n + h) — p(n)) > 2, then we can iterate the "van der Corput operation". By 
Lemma 14,51 there exists k = k(d,m) G N, such that after at most k such operations, we arrive 
to averages involving, for a good set of parameters G of the form (|22p . nice families of pairs 
of polynomials of the type studied in part (a). More precisely, we are left with studying the 
averages in n of 

(27) 9l {T? in+bl T^x) • . . . • 9rh (T^ n+b ™T^x) 

where the functions <?j, and the integers aj,6j,Cj, depend on k parameters, and satisfy: (i) 
gi = fi (this last condition follows easily by the definition of the vdC-operation), and (ii) 
a\ ^ en for i G {2, . . . ,rfi}. Our goal is to show convergence to in L 2 (fi) for the averages of 
(j27|) for this good set of parameters G. Then repeated uses of van der Corput's Lemma show 
that the averages of (I25p converge to in L 2 (/i). 

We proceed to establish our goal. Since the number of functions involved at most doubles 
after each vdC-operation is performed, we have fh < 2 k m. It follows by Theorem 12.21 and prop- 
erties (i) and (ii) above, that if fi±Z 2 k mTl , then for every choice of parameters in the "good" 
set G, the averages of ([27J) converge to in L 2 (/u), establishing our goal. As a consequence, 
the averages of (j25|) converge to in L 2 {[i). 

Concluding, if fi^Z 2 k mTl , then in all cases we showed that the averages of ([25]) converge 
to in L 2 (fi). This completes the proof of Proposition 14.11 □ 

5. A CHARACTERISTIC FACTOR FOR THE HIGHEST DEGREE ITERATE: THE GENERAL CASE 

The next proposition is the generalization of Proposition 14.11 to the case of an arbitrary 
number of transformations. Its proof is very similar to the proof of Proposition 14.11 that was 
given in the previous section. To avoid unnecessary repetition, we define the concepts needed 
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in the proof of Proposition 15.11 and then only summarize its proof providing details only when 
non-trivial modifications of the arguments used in the previous section are needed. 

Proposition 5.1. Let (X, X, p, T±, . . . , Ti) be a system, and fi, . . . , f m 6 L°°(p). Suppose that 
(Vi, . . . ,Ve) is a nice ordered family of i-tuples of polynomials with degree d (all notions are 
defined below). 

Then there exists k = k(d,£,m) £ N such that: If fi-L2f~Ti, then the averages 



N-l 

^2 f l (T Pl ' 1 ^ ■ ■ ■ T PiA ^x) ■ •/ (r pi ' m ^...r p ' !m '' i: ' 



N-M 

n=M 



converge to in L 2 (fj,). 

Applying this result to the family (Vi, . . . ,Ve) where V\ = (pi, 0, . . . , 0), V2 = (0,P2> • • • > 0), 
...,Vi = (0,...,Q,pi), we get: 

Corollary 5.2. Let (X, T±, . . . ,T() be a system, and fi, . . . , fg G Letpi,...,p£ 
be integers polynomials with distinct degrees and highest degree d = deg(pi). 
Then there exists k = k(d,£) such that: If fi JL Zi^Ta then the averages 

n=M 

converge to in L 2 (/i). 

5.1. Families of ^-tuples and their types. In this subsection we follow [8] with some changes 
in the notation. 

5.1.1. Families of l-tuples of polynomials. Let £, m G N. Given £ ordered families of polynomials 

V\ = (pi,i,...,pi, m ),...,7^ = {pt,l, ■ • ■ ,Pl,m) 
we define an ordered family of m polynomial £-tuples as follows 

{Pi, ■■■,Pt) = (Ol,l> • • • ,Pt,l), (Pl,m, ■ ■ ■ ,Pl,m))- 

The reader is advised to think of this family as an efficient way to record the polynomial iterates 
that appear in the average of 

f^(rp'Pi,i(n) _ _ _ J ,Pt ' 1 ^x) • • / (T lPl ' m( ^ n ) . . . J lP£ ' m ^ n ' ) x ) 

The maximum of the degrees of the polynomials in the families Vi, ... ,Pi is called the degree 
of the family (V\, . . . ,Vt). 

For convenience of exposition, if ^-tuples of constant polynomials appear in {Pi , ■ ■ ■ , Vi) we 
remove them, and henceforth we assume: 

All families {Pi, ■ ■ ■ ,Vi) that we consider do not contain £-tuples of constant polynomials. 
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5.1.2. Definition of type. We fix d > 1 and restrict ourselves to families of degree < d. 
For i = 1, ...,£, we define V[ to be the following set (possibly empty) 

Vl = {non-constant p±j G V%'- Pi'j is constant for i < i}. 

(It follows that V[ is the set of non-constant polynomials is V\.) 

For i = 1, . . . ,£ and j = 1, . . . , d, we let u>jj be the number of distinct non-equivalent classes 
of polynomials of degree j in the family V[. 

We define the (matrix) type of the family (Vi, ■ ■ ■ ,Ve) to be the matrix 

/w 1<d ... w lt i\ 
w 2 ,d ■ ■ ■ w 2 ,i 

\we, d ■ ■ ■ W£,i J 

For example, let d = 4, and consider the family of triples of polynomials 

((n 2 ,n 4 ,n 4 ), (n 2 + n, 3n 3 , 0), (2n 2 ,0,2n), (n,2n,0), 

(0,n 3 ,n 4 ), (0,2n 3 ,n 2 ), (0,0, n 3 ), (0,0,n 3 + l)). 

Since 

V[ = {n 2 ,n 2 +n,2n 2 ,n}, V' 2 = {n 3 , 2n 3 } , V' 3 = {n 3 , n 3 + 1}, 
the type of this family is 

2 l\ 
2 0. 
10 0/ 

As in Section 14.2.21 we order these types lexicographically: Given two I x d matrices W = 
(wij) and W = (w'ij), we say that the first is bigger than the second, and write W > W, 
if Wi t d > w'i d-, or w i,d = w \ d an d w i,d-i > w [ d-i-> • • •' or w i,i = » for i = 1, . . . , d and 
W24 > w' 2 d , and so on. As for the types of families of pairs, we have: 

Lemma 5.3. Every decreasing sequence of types of families of polynomial (.-tuples is stationary. 

5.2. Nice families and the van der Corput operation. In this subsection we define a 
class of families of ^-tuples of polynomial that we are going to work with in the sequel, and an 
important operation that preserves such families and reduces their type. 

5.2.1. Nice families. Let V\ = (pi,i, . . . ,Pi, m ), ...,Vi = (p^i, . . . ,pi >m ). 

Definition. We call the ordered family of polynomial ^-tuples (Vi, ■ ■ ■ , V$) nice if 

(i) deg(pi,i) > deg(pij) for j = 1, . . . , m ; 

(ii) deg^i,!) > degipij) for i = 2, . . . , £, j = 1, . . . , m ; 

(iii) deg^i,! - pij) > deg(pi,i - pij) for i = 2, . . . , £, j = 2, . . . , m. 

(Notice that a consequence of (lm|) is that p\ t \ — pij is not constant for j = 2, . . . , m.) 

The type of a nice family of degree 1 has only one non-zero entry, namely 101,1- 
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5.2.2. The van der Corput operation. Given a family V = (pi, . . . ,p m ), p £ Z[f], and ft S N, we 
define ShV and V — p as in Section l4.3.2l Given a family of ^-tuples of polynomials (Vi, ■ ■ ■ ,Vi), 
(pi, . . . ,pi) G (Vi, ■ ■ ■ Vi), and h G N, we define the following operation 

( Pl ,..., Pe ,h)-vdC(v 1 ,...,v i ) = (P lth ,...P tjh y 

where 

Pi,h = (ShPi -ViyVi ~Pi). 

for i = 1, . . . ,£, and * is the operation that removes all constant ^-tuples polynomials from a 
given family of ^-tuples polynomials. Notice that if ("Pi, . . . , Vi) is a degree d family containing 
m polynomial ^-tuples, then for every h G N, the family (pi, . . . ,p^, h) -vdC("Pi, . . . , Vi) has 
degree at most d and contains at most 2m polynomial ^-tuples. 

5.3. Choosing a good vdC operation. As in the case of two transformations, our objec- 
tive is starting with a nice family (Vi, . . . , Vi) to successively apply appropriate operations 
(pi, . . . ,p£, h) -vdC("Pi, . . . , Vi) in order to arrive to nice families of polynomial ^-tuples with 
types that have only non-zero entry the entry w\%. This case then can be treated easily using 
known results that involve a single transformation. 

Lemma 5.4. Let (Vi, ■ ■ ■ ,Vg) be a nice family with deg^i^) > 2. 

Then there exists (pi,---,Pe) 6 (Vi,---,Vi) such that for every large enough h £ N the 
family (pi, . . . ,pi, h) -v&CiVi, . . . , Vi) is nice and has strictly smaller type than (Vi, ■ ■ ■ , Vi). 

Proof. We remind the reader that we have Vi = (Pi,i, ■ ■ ■ , Pi,m) f° r i = 1, ■ ■ ■ ,£■ For {p\ , . . . , pi) £ 
(V\ , . . . , Vi) , the family (pi , . . . , pt, h) -vdC {V\ , ■ ■ ■ , Vi) consists of vectors of polynomials that 
have the form 

{ShPij - pi,..., S h qtj - pi), j = 1, . . . , m, or {pij - pi,... ,pe,j - pi). 
We choose (pi, . . . ,pi) as follows: 

If Vi is non-empty, then we take Pi = • • ■ = Pi-\ = and pi to be a polynomial of smallest 
degree in V[. Then for every h € N, the first £ — 1 rows of the type will remain unchanged, 
and the last row will get "reduced", leading to a smaller matrix type. Similarly, if the families 
V[, V'f__ x , ■ ■ ■ , V[_i are empty, and P[ is non-empty for some 2 < i < £ + 1, then we take 
pi = ■ ■ ■ = Pi-i = and pi to be a polynomial of smallest degree in V[. Then for every liSN, 
the first i — 1 rows of the matrix type remain unchanged, and the i-the row will get "reduced", 
leading to a smaller matrix type. 

Suppose now that the families V[, V[_ v . . . ,V' 2 are empty. If V\ consists of a single polyno- 
mial, namely pi^, then we choose (pi, . . . ,pi) = (pi,i, • • • ,Pi,i) and the result follows. Therefore, 
we can assume that V\ contains some polynomial other than pi i. We consider two cases. If 
P ~ for all p e V\, then we choose (pi, . . . ,pi) = (pi,i, • • • ,P£,i)- Otherwise, we choose 
(pi, . . . ,pi) £ (Vi, ■ ■ ■ ,Vi) with pi oo p x lj and pi is a polynomial in V\ with smallest degree 
(such a choice exists since pi i has the highest degree in V\). In all cases, for every h & N, 
the first row of the matrix type of (p%, . . . ,pi, h) -vdC(Vi, ■ ■ ■ ,Vi) is "smaller" than that of 
(Vi,...,Vt). 

It remains to verify that for every large enough h £N the family (pi , . . . ,pi,h) -vdC(V\ , . . . ,Vi) 
is nice. This part is identical with the one used in Lemma 14.41 and so we omit it. □ 



ERGODIC AVERAGES OF COMMUTING TRANSFORMATIONS WITH DISTINCT DEGREE... 25 

The proof of the next lemma is completely analogous to the proof of Lemma 14.51 in the 
previous section and so we omit it. 

Lemma 5.5. Let ("Pi, . . . ,V£) be a nice family with degree d > 2 that contains m polynomial 
i-tuples. Suppose that we successively apply the (pi, . . . ,pi, h) -vdC operation for appropriate 
choices ofpi, . . . ,pi E Z[i] and h E N, as described in the previous lemma, each time getting a 
nice family of i-tuples of polynomials with strictly smaller type. 

Then after a finite number of operations we get, for a good set of parameters, nice families of 
i-tuples of polynomials of degree 1. Moreover, the number of operations needed can be bounded 
by a function of d, i, and m. 

5.4. Proof of Proposition 15.11 Using Lemma 15.41 and Lemma 15.5^ the rest of the proof of 
Proposition 15.11 is completely analogous to the end of the proof of Proposition 14.11 given in 
Section [3~T1 and so we omit it. 

6. Characteristic factors for the lower degree iterates and proof of 

convergence 

In this section we prove Theorem 11.21 and then Theorem 11.11 

6.1. A simple example. In order to explain our method, we continue with the example of 
Section 14.11 studying the limiting behavior of the averages of 

(28) h{Tfx)-f 2 {T2x). 

We have shown that these averages converge to in L 2 (fi) whenever f\ _L Z 2 ,t 1 - We are 
therefore reduced to study these averages under the additional hypothesis that f\ is measurable 
with respect to Z 2> t 1 • 

Using the approximation property of Proposition 13. 1| we further reduce matters to the case 
where, for //-almost every x E X, the sequence (/i(T n x)) ng N is a 2-step nilsequence. Therefore, 
the sequence (f\(T n x)) nS N is a 4-step nilsequence. We are left with studying the limiting 
behavior of the averages of 

Un (x) ■ f 2 (n i x), 

where (u n ) n ^ is a uniformly bounded sequence of //-measurable functions, such that (u n (x)) n ^ 
is a 4-step nilsequence for //-almost every x £ X. 

In this particular case, Corollary 16.31 below suffices to show that the averages converge to 
in L 2 (//) whenever f 2 -L t 2 - (F° r more intricate averages we need more elaborate results 
about weighted multiple averages.) 

We are reduced to the case where f\ is measurable with respect to Z 2 . t\ and f 2 is measurable 
with respect to ^4,t 2 - Applying Proposition 13.11 to these two functions, we reduce matters to 

2 

the case where, for //-almost every x E X, the sequences (fi(T™ x)) n& ^ and (/2(T^x)) nG N are 
finite step nilsequences. Therefore, for //-almost every x E X, the sequence (128j) is a nilsequence 
and as a consequence its averages converge. 

We introduce now the tools that we need to carry out the previous plan in our more general 
setup. 
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6.2. Uniformity seminorms. We follow |25j . Let k £ N be an integer. Let (a n ) n& z be a 
bounded sequence of real numbers and I = (ijv)jveN be a sequence of intervals whose lengths 
| In I tend to infinity. We say that this sequence of intervals is k- adapted to the sequence (a n ), 
if for every h = (hi, . . . , hk) E N fe , the limit 

Ch (I, a) := Um —r V J] a n+h 1 e 1 + -+h k e k 
neliV eS{0,l} fe 

exists^. Clearly, every sequence of intervals whose lengths tend to infinity admits a subsequence 
which is adapted to the sequence (a n ). 

Suppose that I = (In)n£N is fc-adapted to (a n ) ne ^- We define 

( 1 \ 1/2k 

l<hi,...,h k <H 

Indeed, by Proposition 2.2 of [25], the above limit exists and is non-negative. 

Lemma 6.1. Let (X,X, fJ,,T) be a system, f £ L°°(fi), and I = (ijy)jveN be a sequence of 
intervals whose lengths tend to infinity. Suppose that /L-2^_i „ for some k > 2. 

Then the sequence I admits a subsequence V = (I'^)NeN such that, for ^-almost every x £ X , 
V is k-adapted to the sequence (f(T n x)) n£ ^ and |||/(T ,n :c)|!i' j A ; = 0. 

Proof. Let \i = f fj, x dfj,(x) be the ergodic decomposition of \i. For x £ X, we write a(x) = 
(a n (x)) ne N for the sequence defined by a n (x) = f(T n x). 

By the Ergodic Theorem, for every h = (hi, . . . , hk) £ N fc , the averages 



T~\ ^2 FI a n+h 1 e 1 + -+h k e k (x) 



IN, 

neI N eg{0,l} fc 



converge in L 2 (fi). As a consequence, a subsequence of this sequence of averages converges 
^-almost everywhere. This subsequence depends on the parameter h, but since there are 
only countably many such parameters, by a diagonal argument we can find a subsequence 
I' = (/j v )iV6N such that for fj, almost every x £ X the limit 



(29) c h (l',a(x))= lim ^ E II 



a n+hiei-{ hh k e k I 

7Vl ne/; ee{0,l} fc 

exists for every choice of h = (hi, . . . , h^j £ N fc . This means that, for /i-almost every x £ X, 
the sequence of intervals I' is ^-adapted to the sequence (a n (x)) n ^. 

Furthermore, by the Ergodic Theorem, for every h £ N k the averages on the right hand side 
of ([29]) converge in L 2 (/j,) to 

E 4 II T h ^ + - +h ^f\l(T))(x)= f \\ T h ^+- +h ^fdfi x . 

ee{0,l} k e£{0,l} fc 

Therefore, for /z-almost every x £ X, we have 



Ch(l',a(x))= [ J] T h ^+- +h ^fdfA a 



££{0,1} 



^In [25] it is assumed that the limit exists for h£Z k but this does not change anything in the proofs. 



ERGODIC AVERAGES OF COMMUTING TRANSFORMATIONS WITH DISTINCT DEGREE... 27 

Taking the average in h, using the definition of V^f (Section |3,1|) . and (fT5"jh we get for //-almost 
every x £ X that 

|a(z)lk* = ll/ll*^- 

Since by hypothesis E M (/|2fc_i) = 0, by (fTUj) we have |||/|||fc )At = 0, and as a consequence 
III /III fc,/^ = ^ f° r Ai-almost every x G X by ([9]). This completes the proof. □ 

We are also going to use the following result: 

Theorem 6.2 ([25], Corollary 2.14). Let (a n ) nG pj 6e a bounded sequence of real numbers, and 
I = (ijv)jveN fre a sequence of intervals that is k-adapted to this sequence for some k > 2. 
Suppose that \\a n \li t k = 0. 

Then for every bounded (k — l)-step nilsequence u n we have 

lim —3— y~] a n u n = 0. 

N^oo \I N \ 

Combining the results of this section, we can now prove: 

Corollary 6.3. Let (X, X,[A, T) be a system and f £ L°°(fj,). Let (u n (x)) n ^ be a uniformly 
bounded sequence of fi-measurable functions such that, for fi-almost every x £ X, the sequence 
(u n (x)) n £Ti is a k-step nilsequence for some k > 1. Suppose that /L-S^t. 
Then the averages 

N-l 
n=M 

converge to in L 2 (/j,). 

Proof. It suffices to prove that every sequence of intervals I = (i7v)„ 6 N whose lengths tend to 
infinity admits a subsequence I' = (I' N ) n< z?q such that 

(30) -L f(T n *) ■ M*) -> in L 2 (/i). 



n£l' N 



Let I' be given by Lemma 16.11 (with k in place of k — 1). For /i-almost every x £ X we have 
||(/(T n x)) ne N||i' fc+i = 0- Theorem 16.21 gives that the averages in ([30]) converge to pointwise 
and the asserted convergence to in L 2 (fi) follows from the bounded convergence theorem. 
This completes the proof. □ 

6.3. Some weighted averages. We are going to prove Theorem 11.21 by induction on the 
number of transformations involved. The next result is going to help us carry out the induction 
step. 

Proposition 6.4. Let (X, X,/i,Ti, . . . , Tg) be a system and fi, - ■ ■ , fe £ L°°(p). Letpi, . . . ,p^ £ 
7*[t] be polynomials with distinct degrees and highest degree d = deg(pi). Let (u n (x)) n£ N be a 
uniformly bounded sequence of fi-measurable functions such that, for fi-almost every x £ X , 
the sequence (u n (x)) n< =^ is an s-step nilsequence for some s > 1. 

Then there exists k = k(d,£,s) £ N such that: If f\ _L -Zfcr 17 then the averages 

N-l 

£ h(T^ n) x) ..... f t {Tf^x) • u n {x) 

n=M 
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converge to in L 2 (n). 

Proof. First suppose that deg(pi) = 1. Then the polynomials p2, ■ ■ ■ ,Pe are all constant. The 
polynomial pi has the form p\(n) = an + b for some integers a, b with a ^ 0. Applying 
Corollary 16.31 for Xf/i in place of / and Tf in place of T, and using (fT3|) we get the announced 
result with k = s + 1. 

Therefore, we can assume that deg(pi) > 2. The strategy of the proof is the same as in 
Corollary 16.31 but instead of the Ergodic Theorem used in the proof of Lemma 16.11 we use 
Proposition 15.11 

We assume that fi _L -Hfc,Ti, where k is the integer k(d,£,2 s £) given by Proposition 15.11 
In order to prove the announced convergence to 0, it suffices to show that every sequence of 
intervals I = (In)n£N admits a subsequence I' = (I' n )n<=n such that 

(31) j±- h{Tl l{n) x) ..... h{Tf n) x) - u n {x) converges to in L 2 ( M ). 

1 n1 na' N 

We let m = 2 s , and for x £ X, let a(x) = {a n {x)) n£ z be the sequence given by 

a n (x) = f 1 (T^ n) x).....f e (T^x). 
For n, . . . , r m E Z, we study the averages 

y— V a n+ri (x) . . .a n+rm (x). 

Consider the following £ ordered families of polynomials, each consisting of £m polynomials: 

Vi = (pi(n + n), . . . ,pi(n + r m ), 0, . . . , 0, , 0, . . . , 0) 

T>2 = (0, . . . , 0,p 2 (" + n), . . . ,p 2 ("- + r m ), , 0, . . . , 0) 



Vi = (0, ... ,0,0, ... ,0, ,pi(n + n), . . . ,pf(n + r m )) 

Using that deg(pi) > 2 and deg(pj) < deg(pi) for i = 2, . . . ,£, it is easy to check that this 
family is nice except if r\ € {7*2, . . . , r m }. 

Using Proposition 15.11 (with k = k(d,£,2 s £)) we have that the averages 



— ^2 a n+ ri (x) ■ a n+rm {x) 



In, , 



converge to in L 2 (/i) for every 7*1, . . . ,r m £ Z with r\ ^ {r2, . . . ,r m }. As in the proof of 
Lemma loTP] there exists a subsequence I' = (I' n )n & ^ of the sequence of intervals I such that 

—TT ^2 a n+ri (x) ■ . . . ■ a n+Tm (x) — > /u-almost everywhere 



n&I' N 



for all choices of r%, . . . , r m G Z with ri ^ {r2, . . . , r m }. 
In particular, for every h\, . . . ,h s £ N, we have 



77 XI fi a n+ e ihi+-+e s h s (^) ^-almost everywhere. 



1 7Vl ng/^ee{0,l} 
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To see this, apply the previous convergence property when {n, . . . ,r OT } is equal to the set 

{eih± H h e s h s , a G {0, 1}} and r\ = 0. 

As a consequence, for //-almost every x G X, the sequence I' of intervals is s-adapted to the 
sequence a(x), and a(x)) = for every h G N s . Therefore, |||a(x)|||i/ jS = for /i-almost 
every x G X. By Theorem 16. 2\ we have 




a n (x) ■ Un{x) — > /i-almost everywhere 



and (I3ip is proved. This completes the proof. □ 



6.4. Proof of Theorem 11.21 We are now ready to prove Theorem ll.2i It is a special case 
(take u n to be constant) of the following result: 

Theorem 6.5. Let (X, X, fj,, Ti, . . . , T{) be a system and fi,...,ft G L°°(n). Letpi,...,pe 
be polynomials with distinct degrees and maximum degree d. Let (u„(x))„ g N be a uniformly 
bounded sequence of measurable functions on X such that, for fx-almost every x G X, the 
sequence (u n (x)) n ^ is an s-step nilsequence. 

Then there exists k = k(d, £, s) with the following property: If fi _L Z^t- f or some i G 
{1, . . . ,£}, then the averages 

N-l 

(32) jj—jj £ h{T^x) ..... h{T^x) ■ u n (x) 

n=M 

converge to in L 2 (/i). 

Proof. The proof goes by induction on the number I of transformations. For I = 1, the result 
is the case i = 1 of Proposition 16.41 We take i > 2, assume that the results holds for i — 1 
transformations, and we are going to prove that it holds for I transformations. 

Without loss of generality we can assume that deg(pi) = d > deg(pj) for 2 < i < I. By 
Proposition 16. 4| there exists ko = ko(d,£,s) such that, if f\ _L -Z^Td then the averages ([32]) 
converge to in L 2 (/x). Therefore we can restrict ourselves to the case where 

the function fx is measurable with respect to Zk ,T! ■ 

for every e > 0, there exists fx G L°°(/i), measurable with respect to 
< e, and such that (f\(T^x)) n€ ^ is a &cr s t e P nilsequence for /t- 



By Proposition 13.1 
h-h 



h ,T!, with 

Imc 
that 



L2( M ) 

almost every x G X. By density, it suffices to prove the result under the additional hypothesis 



(fi(Tx%))neN is a kg-step nilsequence for \x-almost every x £ X . 

Then for //-almost every x G X, the sequence (fi(Tf x ))n£N i s a (d^o)-step nilsequence. 
The sequence (/i(Tf^ n ^a;) • u n (x)) n ^ is the product of two /c-step nilsequences where k = 
m&x(dko, s) and thus it is a fc-step nilsequence. Therefore, the announced result follows from 
the induction hypothesis. This completes the induction and the proof. □ 



QING CHU, NIKOS FRANTZIKINAKIS, AND BERNARD HOST 



6.5. Proof of Theorem ED Let (X, X, ft, T 1} . . . , T e ) be a system and fi,...,fe G L°°{ji). 
We assume that the polynomials pi,...,pe G Z[i] have distinct degrees and we want to show 
that the averages 

(33) £ fi(T^ n) x) .... h(Tf n) x) 

n=M 

converge in L 2 (fi). 

By Theorem II. 2\ there exists k G N such that the averages (|33|) converge to whenever 
fi _L ZkTi for some z G {1, . . . ,£}■ Therefore, we can assume that for i = 1, . . . ,£, the function 
fi is measurable with respect to -Z^T; • 

By Proposition 13. 1( for every e > 0, and for i = !,...,£, there exists a function fi G 



measurable with respect to -2^,7^ with 



< e, and such that (fi(T n x)) n£ n is a 

L 2 (^t) 



fc-step nilsequence for /i-almost every x G X. 

By density we can therefore assume that, for i = 1,...,£, and for /u-almost every x G 
X, (/j(T n x)) ng N is a /c-step nilsequence and as a consequence (fi(T Pl ^x)) n ^ is a (dfc)-step 
nilsequence. Then for ^-almost every x G X, the average (I33p is an average of a (dA;)-step 
nilsequence, and therefore it converges by |28| . This completes the proof. □ 



7. Lower bounds for powers 
In this section we are going to prove Theorem 11.31 

We remark that a consequence of Theorem 11.11 is that all the limits of multiple ergodic 
averages mentioned in this section exist (in L 2 (p)). As a result, we are allowed to write 
limjv_Af^oo, where lim sup N _ M ^ OD should have been used. 

We start with some background material. 

7.1. Equidistribution properties on nilmanifolds. We summarize some notions and re- 
sults that will be needed later. 

Polynomial sequences. Let G be a nilpotent Lie group. Let X = G/T be a nilmanifold, where 
r is a discrete cocompact subgroup of G. Recall that for a G G we write T a : X — > X for the 
translation x 1— > ax. 

If ai, . . . , di G G, and p±, ■ ■ ■ ,pe G Z[i], then a sequence of the form g(n) = a^^a^ 2 ^ ■ ■ ■ 
a P*(") jg called a polynomial sequence in G. If x G X and (<7(«))neN is a polynomial sequence 
in G, then the sequence (g(n)x) n ^ is called a polynomial sequence in X. 

Sub-nilmanifolds. If H is a closed subgroup of G and cc G X, then Hx may not be a closed 
subset of X (for example, take X = R/Z, x = Z, and -ff = {k\^2: k G Z}), but if it is closed, 
then the compact set Hx can be given the structure of a nilmanifold ([28]). More precisely, 
if x = gT, then Hx is closed if and only if A = H Pi gTg^ 1 is cocompact in if. In this case 
iifx ~ H/A, and /i 1— > hgT induces the isomorphism from H/A onto Hx. We call any such set 
Hx a sub-nilmanifold of X. 
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Equidistribution. We say that the sequence (g(n)x) nE ^, with values in a nilmanifold A, is 
equidistributed (or well distributed) in a sub-nilmanifold Y of A, if for every F G C(A) we have 



where my denotes the normalized Haar measure on Y . 
For typographical reasons, we use the following notation: 

Notation. If E is a subset of X, we denote by clxC-K) the closure of E in X. 

A fact that we are going to use repeatedly is that polynomial sequences are equidistributed 
in their orbit closure. More precisely: 

Theorem 7.1 (|28j). Let X = G/T be a nilmanifold, (g(n)x) n ^ be a polynomial sequence in 
X, and Y = c\x{g(n)x, n G N}. 

(i) There exists r G N such that the sequence (g(rn)x) n ^ is equidistributed on some 
connected component ofY. 

(ii) IfY is connected, then Y is a sub-nilmanifold of X, and for every r G N the sequence 
(g , (rn)x) ng N is equidistributed on Y. 

Ergodic elements. An element a G G is ergodic, or acts ergodically on X, if the sequence 
(a n r) ne N is dense in X. 

Suppose that a G G acts ergodically on X. Then for every x £ X the sequence (a n x) ne ^ is 
equidistributed in X. If X is assumed to be connected, then for every r G N the element a r also 
acts ergodically on A (this follows from part (iii) of Theorem 17, lj) . For general nilmanifolds 
A we can easily deduce the following result (with Ao we denote the connected component of 
the element T): There exists ro G N such that the nilmanifold A is the disjoint union of the 
sub-nilmanifolds Aj = a* Ao, i = 1, . . . , ro, and a r acts ergodically on each X% for every r G roN. 

TTje affine torus. If A = G/T is a connected nilmanifold, the affine torus of A is defined to be 
the homogeneous space A = G/([Gq, Go]T), where by Go we denote the connected component 
of the identity element in G. The homogeneous space A can be smoothly identified in a natural 
way with the nilmanifold Go/([Go, Go](r n Go)), which is a finite dimensional torus, say T m 
for some m G N. It is known ([15]) that, under this identification, G acts on A by unipotent 
affine transformations. This means that every T g : T m — > T m has the form Tx = Sx + b, for 
some unipotent homomorphism S of T m and b G T m . 

Equidistribution criterion. If A = G/T is a nilmanifold, then A is connected if and only if 
G = Gor. In the sequel we need to establish some equidistribution properties of polynomial 
sequences on nilmanifolds. The next criterion is going to simplify our task: 

Theorem 7.2 (|28j). Let X = G/T be a connected nilmanifold, (s(ji))neN be a polynomial 
sequence in G, and x G A. Let A = G/([Gq, Gq]T) be the affine torus of X and tta '■ X — > A be 
the natural projection. 

Then the sequence (g(n)x) n ^ is equidistributed in X if and only if the sequence (g(n)irA(x)) n <=N 
is equidistributed in A. 




N-l 



?>2 
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7.2. An example. In order to explain the strategy of the proof of Theorem 11.31 we use an 
example. Our goal is to show that for a given system (X, X, fj,, T\, T%), and set A G for 
every e > 0, we have 

fi(A n Tf n A n T 2 " 2 A) > fi(A) 3 - e 

for a set of n G N that has bounded gaps. 

After some manipulations that are explained in Section 17.31 we are left with showing that if 
/i_L/C rat (Ti) or / 2 -L/C m t(r 2 ), then the averages of 

(34) h(T[ l x)-f 2 {Tfx) 

converge to in L 2 ([i). In fact, we are only going to be able to prove a somewhat more technical 
variation of this property (see Proposition I7.3P . but the exact details are not important at this 
point. 

By Theorem II .21 we can assume that the function f\ is -2^,Ti -measurable and the function / 2 
is -Z^j-j-measurable for some k G N. For convenience, we also assume that the transformation 
T\ is totally ergodic (meaning T[ is ergodic for every r G N). In this case, using Theorem 12.11 
and an approximation argument, we can further reduce matters to the case where A" is a 
connected nilmanifold, /U = mj, and T\ = T a is an ergodic translation on X. The assumption 
that X is connected is important, and is a consequence of our simplifying assumption that 
the transformation T\ is totally ergodic. Also, by Proposition 13.11 we can assume that for 
mx-almost every x £ X the sequence u n (x) = /^(T^x) is a finite step nilsequence. 

After doing all these maneuvers our new goal becomes to establish the following result: 
(a) Let A be a connected nilmanifold, a be an ergodic translation of X, and J f\ dmx = 0. 
Let (lin)neN be a uniformly bounded sequence of measurable functions such that (tt n (^))neN is 
a nilsequence for m^-almost every x G X. Then the averages of 

fi{a n x) ■ u n 2(x) 

converge to in L 2 (mx)- (The conclusion fails if X is not connected.) 
It is easy to see that (a) follows from the following result: 

(a)' Let X be a connected nilmanifold and a be an ergodic translation of X. Let Y be a 
nilmanifold and b be an ergodic translation of Y. Then for mx-almost every x G X we have: 
for every nilmanifold Y, every ergodic translation b of Y, and every y G7, the sequence 

(a n x,b n2 y) 

is equidistributed on the nilmanifold X x Y. 

We prove a variation of this result that suffices for our purposes in Lemma 17.61 This is the 
heart of our argument, and we prove it by (i) showing that it suffices to verify the announced 
equidistribution property when each translation a and b is given by an ergodic unipotent 
affine transformation on some finite dimensional torus, and then (ii) verify the announced 
equidistribution property for affine transformations using direct computations (see Lemma [73]). 
It is in this second step that we make crucial use of the special structure of our polynomial 
iterates; our argument does not quite work for some other distinct degree polynomials iterates 
like n and n 2 + n. The key observation is that since all the coordinates of the sequence (a n x) 
have non-trivial linear part, and those of (b n y) have trivial linear part, for typical values of 
x G X, it is impossible for the coordinates of the sequences (a n x) and (b n y) to "conspire" and 

2 

complicate the equidistribution properties of the sequence (a n x,b n y). 
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If the transformation T\ is ergodic but not totally ergodic, then further technical issues arise, 
but they are not hard to overcome. If T\ is not ergodic, then it is possible to use its ergodic 
decomposition, and the previously established ergodic result per ergodic component, to deduce 
the result for T\. Finally, if /2-L /C ra t(?2), we first use the previously established result to reduce 
matters to the case where the function f\ is /C ra t(Ti)-measurable, and then it becomes an easy 
matter to show that the averages of ([34ft converge to in L 2 (/i). 

7.3. Proof of Theorem 11.31 modulo a convergence result. We are going to derive Theo- 
rem [L3] from the following result (that will be proved in the next subsection): 

Proposition 7.3. Let (X, X, fx, T±, . . . , Tg) be a system. Let di,...,d,£ G N be distinct and 
fi,...,fi€ L°°(fi). Suppose that fi _L /C rat (Ti) for some i = 1, . . . ,£. 

Then for every e > 0, there exists ro G N, such that for every r G tqN, we have 



(35) lim 

N-M->co 



1 N ~ 1 

1 Y.h{T{ rn)dl x)-...-h{Tt )l < 



N-M 

n=M 



< e. 



(The existence of the limit is given by Theorem 11.1 



Remark. The conclusion should hold with ro = 1 and e = 0, but we currently do not see how 
to show this. 

We are also going to need the next inequality, it is proved by an appropriate application of 
Holder's inequality: 

Lemma 7.4 (|10|). Let i G N, (X,X,fj,) be a probability space, Xi,X2, ■ ■ ■ ,Xe be sub-a -algebras 
of X , and f G L°°(p) be non-negative. 
Then 



J f ■ E(/|*i) • E(/|# 2 ) • . . . • E(f\Xi) dn>(Jfdfi 



Proof of Theorem \1.3\ assuming Proposition 1.3, Let e > 0. It suffices to show that there exists 
r G N such that 



N-l 



(36) lim — !— y /1 (inr 1 wdl in-nT„ w ^)> /i (A) w -2 £ . 

n=M 

First we use Proposition 17.31 to choose ro G N so that for every r G roN we have the estimate 
([35]) with e/2 e in place of e. Next we choose a multiple r of ro such that for i = 1, . . . ,£ we 
have 

(37) ||E(l A |/C r (r i ))-E(l A |/C rat (T l ))|| L2M < j. 

We claim that for this choice of r equation (j36j) holds. Indeed by (|35p (with e/2 £ in place of e) 
we have that the limit in (j36|) is e-close to the limit of the averages of 

(38) J 1 A ■ Ti rn)dl E(l A \ Krztim T, (rn)d 'E(l A | /C rat (T,)) dpi. 

Using (|37|) we easily conclude that the limit in (|38p is e close to the limit of the averages of 

J i A ■ r^ECuMri)) • . . . • r £ (rn)d£ E(i A |/c r (r £ )) dp. 
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Since T r f = f for JC r (T)-measurable functions /, the last expression is equal to 



J \ A • E(l A |/C r (Ti)) • . . . • E(1 A |^(T/)) d/M. 



By Lemma 17.41 the last integral is greater or equal than /j,(A) +1 . It follows that (|36p holds 
and the proof is complete. □ 

7.4. Some equidistribution results. In the next subsection we prove Proposition 17.31 A 
crucial step in the proof is an equidistribution result on nilmanifolds that we prove in this 
subsection. We start with a lemma. 

Lemma 7.5. Let d,m\,m 2 G N and T: T mi — > T mi be an ergodic unipotent affine transfor- 
mation. For i = 1, . . . ,771-2, ^t Ui G R[i] be a polynomial divisible by t d+1 . Suppose that the 
sequence (u(n)) ng N, with values in T" 12 , defined by 

u(n) = (ui(n) (mod 1), . . . ,u m2 (n) (mod 1)) 

is equidistributed on T m2 . 

Then for mt™\-almost every x G T mi the sequence (T n x,u(n)) n< z^ is equidistributed on 
T™ 1 x T m ' 2 . Furthermore, the set of full mjrn 1 -measure can be chosen to depend only on the 
transformation T (so independently of the sequence (u(n)) ne ^). 

Proof. Suppose that T: T mi — > T mi is defined by Tx = Sx + b for some unipotent homo- 
morphism S of T mi and b G T mi . We claim that the desired equidistribution property holds 
provided that x satisfies the following condition: 

(39) If ki • b + k 2 • x = mod 1 for some h,k 2 £ Z mi , then k 2 = 0. 

This defines a set of full measure in T mi that depends only on the element b G T mi . 

Let xq be any point in T mi that satisfies (139p . Let x be a non-trivial character of T mi x 
T" 12 . Then x = (xi)X2) f° r some characters xi °f r ^ mi an d Xi °f T m2 , and either xi or 
X2 is non-trivial. By Weyl's equidistribution theorem, in order to verify that the sequence 
(T n xo,ti(n)) ng N is equidistributed on T mi x T m2 , it suffices to show that 

N-l 

(40) Km ^ r -r ? ^Xi(T md x )-X2Kn)) = 0. 

N—M-toc iv — M * — ' 
n=M 

If xi = 1) then ([4*0]) holds because, by assumption, the sequence (u(n)) n& ^ is equidistributed 
on T m2 . Suppose now that xi / 1- Since 5: T mi -> T mi is unipotent, we have (5 - I) mi = 0. 
For n > mi, a straightforward induction shows that 

mi— l mi— 1 / \ 

^o=e(I ^'N+e ;)'-')*'. 

fc=0 V 7 fc=0 v 7 

Therefore, the sequence (T n x) ri , e N is polynomial in n and 

(41) r n x = x + n((S-I)x + b) +n 2 p(n) 
for some polynomial p G R[i]. 

Claim. Xi((5' — ^)xq + &) = e(a) for some irrational number a. 
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Suppose on the contrary that Xi\{S — I)xq + o) is rational. After replacing xi by some power 
of xi we can assume that Xi{(S — I)xq + 6) = 1. We write Xi( x ) = e (^i • x ) where k\ is some 
non-zero element of Z mi . Then k\ ■ ((£* — I)xq + b) = (mod 1), or equivalently, 

(42) (k 1 -(S-I))-x o + k 1 -b = (modi). 

Combining (|39l) and (|42|) we get that fci • (S 1 — I) = (mod 1). Using (j42j) again we get that 
fci ■ 6 = (mod 1). As a result, (j42l) holds for every x G T mi in place of xo, or equivalently, 
Xi((S — I)x + b) = 1 for every x G T mi . Therefore, xi(Tx) = Xi( x )- Since xi / 1> this 
contradicts our assumption that the transformation T is ergodic. This completes the proof of 
the claim. 

From (|4ip we conclude that 

Xi(T n x ) = e(x + na + n 2 p(n)) 

for some irrational a. 

Using this, and our assumption that all the polynomials Ui(t) are divisible by t d+1 , we get 
that 

Xi(T nd x ) ■ X2(u(n)) = e(x + n d a + n d+l p(n)) 

for some polynomial p G R[t]. Since a is irrational, it follows from this identity and Weyl's 
equidistribution criterion that (|40p holds. This completes the proof. □ 

Lemma 7.6. Let X = G/T be a connected nilmanifold, a G G be an ergodic element, and 
d G N. Let Y = H/A be a nilmanifold, (g(n)y) n€ ^ defined by g(n) = a^ 1 ^ • . . . • be a 

polynomial sequence on Y, and suppose that the polynomials p±, . . . ,p£ are all divisible by t d+1 . 

Then there exists ro G N such that for mx -almost every x G X we have: For every r G 
r^H, the sequence ((a^™) x, g(rn)y)^ ngN is equidistributed on the set X x cly{<7(rn)y, ra G N}. 
Furthermore, the set of full mx -measure can be chosen to depend only on the element a G G 
(so independently ofY, y, and g(n)). 

Remark. It is crucial for our subsequent applications that the full mx-measure set of the lemma 
does not depend on the polynomial sequence (g(n)y) n( zfq. It is for this reason that we require 
the polynomials p\, . . . ,p£ to be divisible by t d+l . 

Proof. The connected case. Suppose first that the set cly{g , (n)y, n G N} is connected. In this 
case we are going to show that ro = 1 works. 

First, by part ([n]) of Theorem 17.11 the set c\y{g{n)y, n G N} is a sub-nilmanifold of Y. 
Substituting this set for Y we can assume that Y = cly{g , (n)y, n G N}. By part ([n]) of 
Theorem 17.11 we have 

(43) the sequence (g(n)y) is equidistributed in Y. 
(a) First we claim that it suffices to show 

(i) For m^-almost every x G X, where the set of full measure depends only on a, the 
sequence ((a n ° x , g(n)y)) is equidistributed on the set X xY. 

Indeed, since the nilmanifolds X and XxY are connected, it follows by part (hi) of Theorem l7.ll 
that for every r G N we have cly {g(rn)y, n G N} = Y, and for every r G N and every x in the 
set defined in (ii), the sequence ((a^™) x,g(rn)y)) is equidistributed on the set XxY. This 
proves the claim. 
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(b) Next, we use the convergence criterion given in Theorem 17.21 

Let A x = G/([G ,G ]T) be the affine torus of X, Ay = H/{[H ,H ]A) be the affine torus 
of Y, and tta x ■ X — > Ax, tta y '■ Y ~^ Ay be the corresponding natural projections. We first 
remark that Ax x Ay is the affine torus of X x Y, with projection irx X ny. 

Since the sequence (g(n)y) is equidistributed in Y, the projection of this sequence onto Ay 
is equidistributed on Ay. By Theorem 17.21 in order to show the required equidistribution 
property (i), it suffices to verify the following statement: 

(ii) For mx-almost every x E X, where the set of full measure depends only on a, the 
sequence 

((a n \ Ax (x),a p 1 ^.....af n \ AY (y))) 

is equidistributed on Ax x Ay. 

This statement is the same as (i), with Ax substituted for X, Ay substituted for Y. We remark 
that all the hypotheses of the lemma remain valid when we make this substitution. 

Therefore, using the identification explained in Section I7.1( we can restrict, without loss of 
generality, to the case where X = T mi for some mi E N, the translation T a : x i— )■ ax on X is an 
ergodic unipotent affine transformation of T mi , and where Y = TP™ 2 for some integer 771,2 E N 
and for i = 1, . . . ,£ the translation T ai : y 1— > aiy on Y is a unipotent affine transformation of 

T m2 . Moreover, by (|43|). the sequence (T^ 1 ^ • . . . • Tat y) 1S equidistributed on T m2 . 

Since the uniform distribution is not affected by translation, the statement (ii) can be rewrit- 
ten in the following equivalent form: 

(iii) For mT m i -almost every x E T mi , where the set of full measure depends only on the 
transformation T ai , the sequence 

{(Tfx,T^.....T^y-y)) 

is equidistributed on T mi x T m2 . 

(c) Define the sequence (u(n)) n ^ with values in T m2 by 

u(n)=T^.....T^y-y ■ 

For i = 1, . . . ,£, since T ai is a unipotent affine transformation, T™y is given for every n by 
a formula similar to (141j) . Therefore, for j = 1,..., 777,2, each coordinate Uj{n) of u{n) is 
a polynomial in n with real coefficients and without a constant term. Moreover, since by 
hypothesis the polynomials Pi(t) are divisible by t d+1 , all the polynomials Uj(t) are divisible by 
t d+l . 

Hence, Lemma 17.51 is applicable and the statement (iii) is proved. This completes the proof 
of the result in the case where the set c\y{g{n)y, n E N} is connected. 

The general case. Lastly we deal with the case where the set cly{g(n)y, n E N} is not necessarily 
connected. By Theorem 17.11 there exists an tq E N such that the set cly{g(ron)y, n E N} is 
connected. Substituting the sequence (g{ron)y) for (g(n)y) and a r o for a (which is again an 
ergodic element), the previous argument shows the advertised result for this value of tq E N. 
This completes the proof of the result in the general case. □ 

We deduce from the previous lemma a result that is more suitable for our purposes: 
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Corollary 7.7. Let X = G/T be a nilmanifold, a E G be an ergodic element, f E C(X) 
with E mx (/| /C ra t(T a )) = 0, and d, d\ . . . ,di E N with d < di for i = 1, . . . ,£. Suppose that 
(i*i,n)neN) • • • > (ii£,n)neN ) are finite step nilsequences. 

Then there exists ro E N such that, for mx -almost every x E X, the following holds: For 
every r E tqN we have 



(44) atJI^oo iV~M 2 f( a(rn)dx ) ■ u l,(rnyi ■ ■ ■ Ui,(rn)*t = °" 

Furthermore, the set of full mx -measure can be chosen to depend only on the element a E G. 

Proof. The connected case. Suppose first that the nilmanifold X is connected. Using an 
approximation argument we can assume that for i = 1, . . . ,£ the sequence u^ n is a basic finite 
step nilsequence. In this case, for i = 1,...,£ there exist nilmanifolds Xj = Gj/Tj, elements 
ai E Gi, and functions E C(Xj) such that Uj jn = fi(afTi). We define G = G\ x • ■ • x Gi, 
r = Ti x ••• x T^, and X = X\ x • • • x Xg = G/T. Let (g(n)) n ^ be the polynomial sequence 
in G given by g{n) = {a p ^ n \ . . . , a^^) for every n. 

Lemma 17.61 gives that there exists tq E N such that for mx-alrnost every x E X we have: 
For every r E roN, the sequence (a( rri ) x, g(rn)T) is equidistributed on the nilmanifold X x Y 
where Y = cl x {g(rn)T,n E N}. 

Therefore, for every / E C(X) and F E C(X) we have 



N-l 

v-feooiV^M S /(a (rn)d x)-F( 5 (rn)f)= / f{x) dm x (x) ■ / F(z) dm x (x). 



Letting F = f\ ■ . . . ■ f#, and using that J f dmx = 0, we get the advertised identity. This 
completes the proof in the case where the nilmanifold X is connected. 

The general case. Let Xq be the connected component of the nilmanifold X. Since a is an 
ergodic element, there exists k E N such that the nilmanifold X is the disjoint union of the 
sub-nilmanifolds Xi = o % Xq, i = 1, . . . , k, and a k acts ergodically on each Xj. Furthermore, 
since E mx (/| /C r at(2a)) = 0, we have J f dmx l = for i = 1, . . . , k. For i = 1, . . . , k, we can 
apply the previously established "connected result" , for the translation a kd in place of a, acting 
(ergodically) on the connected sub-nilmanifolds Xi, and the nilsequences {u. k d jn ) in place of 
( u j,n), j = 1) • • • 5 1- We get that there exist Vi E N such that for every r E Ar^N equation (|4"4"|) 
holds for mjfj-almost every x E Xj. It follows that if ro = fc[]i=i r 'i then for every r E tqN 



equation (I44D holds for mx-ahnost every x E X. This completes the proof in the general 
case. □ 



7.5. Proof of the convergence result (Proposition [773]) . In this section we prove Propo- 
sition [7]3] by induction on the number of transformations involved. The key ingredient in the 
proof of the inductive step is the following special case of Proposition 17.31 

Lemma 7.8. Let (X, X, fj,, T\, . . . , Ti) be a system. Let di, . . . , di E N be distinct and suppose 
that d\ < di for i = 2, . . . ,£. Let /i, ... , fi E L°°{(jl) and suppose that f\ _L ZC rat (Ti). 
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Then for every e > 0, there exists ro G N, such that for every r G roN, we have 

N-l 

t (rr(rn) d i , , r 

"4 



lim 

N-M^too 



1 



/i(T{ a?) • . . . • /t(T/ x 



n=M 



< e. 



L3(Ai) 



Proof. Let e > 0. Without loss of generality we can assume that all the functions involved 
are bounded by 1. From Theorem 11.21 we have that there exists fc G N, depending only on 
max((fi, . . . , di) and such that if /, _L Z^^ for some i = 1, . . . , £, then the corresponding mul- 
tiple ergodic averages converge to in L 2 (/j,). Therefore, we can assume that fi G L 00 ^^,//) 
for i = 1, . . . ,£. Then Proposition 13. II shows that for i = 2, . . . ,£ there exist functions fi, with 
L°°-norm bounded by 1, that satisfy 



(i) fi G L°°(Z k T.,ii) and 



fi fi 



L2( M ) 



< e/(2* + 2) 



(ii) for every r G N and x G X the sequence (/i(T™ l x)) ng N is a (dj/c)-step nilsequence. 

An easy computation then shows that in order to prove the announced claim, it suffices to 
show the following: If f\ G L 00 (Z/ Cj r 1 , /t) and /i_L /C rat (Ti), then there exists ro G N such that 
for every r G tqN we have 



(45) 



lim 

N-M^-oo 



1 



N-l 



N- M 



(rn) a 



f2(T, 



{rn) a 



7 frr (rn) d l \ 



n=M 



< 



L 2 M 



The ergodic case. Suppose first that the transformation T\ is ergodic. Since f\ G L^iZ^^ /t), 
after using an appropriate conjugation we can assume that T± is an inverse limit of nilsystems. 
Furthermore, after using an approximation argument we can assume that T\ = T a where a 
is an ergodic rotation on a nilmanifold X = G/T, and f\ G C(X), while still maintaining 
our assumption that f\ _L ZC mt (Ti). (If f \SD where V is any sub-cr-algebra of X, and g is 
such that ||/ - g\\ L i^ < e/2, then ||E(c/|£>)|| l i (m) < e/2. Therefore, ||/ - g\\ L i^ < e where 
g = g — K(g\T>), and E(g\V) = 0.) In this case, combining property (ii) above and Corollary 17. 71 
we get that there exists ro G N such that for every r G roN the averages (|45p converge to for 
mx-almost every x G X, and as a result in L 2 (mx)- This completes the proof of (|45p in the 
case where the transformation T\ is ergodic. 

The general case. Suppose now that the transformation T\ is not necessarily ergodic. Let 
A* = / fJ-x dfx be the ergodic decomposition of \i with respect to the transformation T\. Since 
fi G L°°(Zfc i T 1 ,^) A*); Corollary 13.31 shows that for //-almost every x G X we have /i G 
L°°(Zk ! T 1 ,fj, x i A**)' Furthermore, since E M (/i| /C ra t(T, //)) = 0, we have for //-almost every x G X 
thatE^(/ 1 |^ at (T > /i a .))=0. 

For every ro G N we define the //-measurable set 

X ro = {i£l: (I45j) holds for every r G roN, with \i x in place of /t, and e/2 in place of e}. 

Notice that when we previously established the "ergodic case", we did not use the invariance 
of the measure /i under the transformations Tj for i ^ 1; we merely used the fact that for 
i = 2, ...,£, for //-almost every x G X, the sequences (fi(T™x)) nt =N are fc-step nilsequences. 
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Hence, we can use the previously established "ergodic result" for /z-almost every measure [i x , 
and conclude that 

M |J x ro ) = 1. 

r GN 

Also, we clearly have X r C X s if r divides s. It follows that there exists ro G N such that 

fi(X ro ) > l-e/4. 

As a direct consequence, for this choice of ro, equation (j45|) holds for every r G r^N. This 
completes the proof. □ 

We are now ready to prove Proposition 17.31 

Proof of Proposition \7.3\ Without loss of generality we can assume that d\ < di < de for 



,^-land ||/, 



»llL°°(/Lt) 



< 1 for i = 1, 



We are going to use induction on the number of transformations I. For £ = 1 the statement 
is known (Chapter 3 in |18] ) and in fact it holds with ro = 1 and e = 0. Suppose that £ > 2, 
and the statement holds for £ — 1 transformations. We are going to show that it holds for £ 
transformations. Namely, we are going to show that if /j _L /C ra t(Tj) for some i G {1, . . . ,£}, 
then for every e > 0, there exists ro G N, such that for every r G roN we have 



lim 

N-M^too 



1 



AT-1 



N -M 



E AW 



(rn) d l 



(rn) d e 



n=M 



< e. 



L2( M ) 



Let e > 0. If /i _L /C ra t(Ti), then the result follows from Lemma 17.81 So we can assume that 
fi _L /C ra t(Tj) for some i G {2, ...,£}. By Lemma 17.81 we can assume that the function f± is 
^rat(?i)-measurable. Furthermore, using a standard approximation argument we can assume 
that the function f\ is fC ri (Xi)-measurable for some r\ G N. Since for every r G riN we have 
T r f\ = /i, it remains to find ri G riN, such that for every r G we have 



lim 

iV-M->oo 



1 



AT-1 



iV-M 



(rn) d i 



■■■•/*CT, 



(rn)" 



n=M 



< e. 



Such an integer r-i exists from the induction hypothesis. This completes the induction and the 
proof. □ 



Appendix A. Some "simple" proofs of special cases of the main results 

It turns out that Theorem 11.21 can be strengthened, and the proof of Theorems 11.11 [L2l and 
II. 3\ can be greatly simplified in some interesting special cases, namely when £ = 2 and one 
of the two polynomials is linear. Such a simplification is feasible because of the nature of the 
averages involved; it turns out to be possible to get simple characteristic factors by using a 
variation of van der Corput's Lemma, and then appealing to a known result from [15] . We take 
the opportunity in this section to give these simple arguments. Hopefully, the non-persistent 
reader, that does not want to embark to the details of the more complicated proofs of our main 
results, will benefit from the proofs of the special cases given here. 

The key ingredient in the proofs is the following result: 
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Theorem A.l (|15j). Let (X, X , fj,,T) be a system and suppose that the integer polynomials 
l,p,q are linearly independent. Let f,g£ and suppose that f _L )C Ta ,t(T) or g _L /C ra t(T). 

Then 

N-l 



lim 



jV-M->oo N — M 



f(T p ^x).g(T 







n=M 



where the convergence takes place in L 2 (fi). 

Remark. The proof in [15] is given for ergodic systems, but the announced result follows directly 
from this, since /_L ZC mt (T, fi) implies that /_L/C ra t(T, fi x ) = for /i-almost every x £ X, where 
as usual, fj, = J fj, x d[i(x) is the ergodic decomposition of fx. 

We are also going to use the following variation of the classical elementary lemma of van der 
Corput. Its proof is a straightforward modification of the one given in [5]. 

Lemma A. 2. Let {vN^N.neN be a bounded sequence of vectors in a Hilbert space. For every 
h £ N we set 

N 



lim 



jV->oo 



ly 

N ^ 



< VN,n+h, VN,n > 



Suppose that 



Then 



n=l 
H 



lim — V b h = 0. 



lim 

N^foo 



h=l 

-Y 

n=l 



VN-, 



0. 



We start with the following strengthening of Theorem 11.21 in our particular setup: 

Theorem A. 3. Let {X, X, fi, T, S) be a system. Let f,g £ L QO (fi) and suppose that either 
f±K iat (T) or g 1_ /C rat (S). 

Then for every polynomial p £ Z[i] with deg(p) > 2 we have 

N-l 



(46) 



lim 



1 



jV-A/^oo N - M 



f(T n x)-g(S^x)=0 



n=M 



where the convergence takes place in L 2 (/j,). 

Proof. Suppose first that E(g| /C ra t(5)) = 0. It suffices to show that for every sequence of 
intervals (iAr)jveN with length increasing to infinity, the averages in n over the intervals In of 

h N (x)-f(T n x)-g(S^x)d t i 

converge to 0, where h^{x) = Y^ueIn f(T n x) ■ g(S p ^x). Equivalently, it suffices to show 
that the averages over the intervals In of 

f(x).h N (T~ n x).g(S^T~ n x) d/x 
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converge to 0. Using the Cauchy-Schwarz inequality it suffices to show that the averages over 
the intervals In of 

h N (T- n x) -g(S p ^T- n x) 

converge to in L 2 (fi). By Lemma lA. 21 it suffices to show that for every m G N the averages 
in n over the intervals In of 

h N (T~ n x) ■ g{S p{n) T~ n x) ■ h N {T^ n+m ) X ) ■ g (S p{n+m) T- { - n+m ^ x) dfi 

converge to 0. We compose with the transformation T n and use the Cauchy-Schwarz inequality. 
It suffices to show that for every m G N the averages in n over the intervals In of 

g{S p(n) x) • g(T- m S p{n+m) x) 

converge to in L 2 (/x). Since deg(p) > 2, for every m G N the polynomials l,p(n),p(n + m) 
are linearly independent. Since g _L /C ra t(5), Theorem lA.il verifies that the last identity holds. 
It remains to show that if / J_ /C ra t(T), then the averages over the intervals In of 

f(T n x)-g(S p ^x) 

converge to in L 2 (p). Using the previously established property we get that the above limit 
remains unchanged if we replace the function g with the function M(g\ IC Ta t(S)). Furthermore, 
using an approximation argument and linearity, we can assume that Sg = e{r)g for some r G Q. 
In this case, it suffices to show that the averages over the intervals In of 

f{T n x) ■ e{rp{n)) 

converge to in L 2 (p). Using the spectral theorem for unitary operators it suffices to show 
that for every r G Q we have 

(47) lim — — - e(nt + rn 2 ) 



\In\ , 



= 



where at denotes the spectral measure of the function /. Since /J_/C ra t(r), the measure cr/ 
has no rational point masses. Furthermore, as is well known, for t irrational the averages in 
(I47p converge to pointwise. Combining these two facts, and using the bounded convergence 
theorem, we deduce that (f47l) holds. This completes the proof. □ 



We deduce the following special case of Theorem 11.11 

Theorem A. 4. Let (X,X,p,,T,S) be a system and f,g G Letp G Z[t] with deg(p) > 2. 

Then the limit 

N-l 

lim — — y f(T n x) ■ g(S p ^x) 

n=M 

exists in L 2 (p,). 

Proof. By Theorem I A. 31 we can assume that the function / is /C ra t(T)-measurable and the func- 
tion g is /C ra t(S')-measurable. Furthermore using an approximation argument we can assume 
that T r f = f and T r g = g for some r G N. In this case the result is obvious. □ 

As a corollary we get an short proof for weak convergence of some multiple ergodic averages 
recently studied by T. Austin in [4] (where strong convergence was proven when p(n) = n 



2\ 
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Corollary A. 5. Let (X,X, fi,T, S) be a system and f , g G L°° (fi) . Letp£Z[t] with deg(p) > 2. 
T/ien i/ie averages 

1 iV ~ 1 

(48) ^— ^ 53 /(T^x) • g(T p ^S n x) 

n=M 

converge weakly in L 2 (n) as N — M — >■ oo. Furthermore, the limit is ?/ either g±JC ra ,t(S) or 
f±(K Tat (T) V/C rat (S)). 

Proof. Notice that for every /j E L°°(/x) the averages of 

fc(s) • f(T p ^x) ■ g{T p ^S n x) dfi 



are equal to the averages of 
(49) / /(x) • h(T- p ^x) ■ g{S n x) dp,. 



Theorem IA.4I shows that the averages of (|49p converge, therefore the averages (|48|) converge 
weakly. Furthermore, Theorem IA.3I shows that the averages of (|49p converge to if either 
g_L /C rat (<S) or h-LK Ia x(T), and as a consequence they converge to if /_L(/C rat (T) V JC ra ,t(S)). 
Therefore, if gJ- /C ra t (S) or /_L(/C rat (T) V /C r at(5 , )) ) then the averages of (j4~9j) converge weakly 
to 0. This completes the proof. □ 

Finally we establish the following result: 

Theorem A. 6. Let (X, X , p,,T, S) be a system and A £ X. Let p £ Z[i] with deg(p) > 2 and 
p(0) = 0. 

Then for every positive integer k > 2 and e > the set 

{n G N: fj,(A n T~ n A n 5" p(n) A) > ^(^l) 3 - e} 

was bounded gaps. 

Proof. Let e > 0. There exists r G N such that 
(50) 

\\E(l A \JC r (T)) -E(l A \JC Tat (T))\\ L2M <e/3, ||E(l A |/C r (S))-E(l A |/C rat (5))|| £a(M) < e/3. 
It suffices to show that 

iV-l 

l im V" uM n T~ rn A n 5" p ( rn U) > fi(Af - e. 

7V_M-^oo N — M ^— ' ~ v y 

n=Af 

Using a straightforward modification of Theorem I A. 31 where T n is replaced with T rn , we see 
that the previous limit is equal to the limit of the averages of 



1 A • r-™E(l A | /C rat (T)) • S- p ^E(l A \ /C rat (S)) d/x. 
Using (|50p we see that the last limit is e-close to the limit of the averages of 
J 1 A • T- rn E(l A |/C r (T)) • S- p{rn) E{l A \K r {S)) dpi. 
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Since T r f = f for fC r (T)-measurable functions /, S r f = f for A^ r (5)-measurable functions /, 
and r\p(rn) for every n G N (since p(0) = 0), the last limit is equal to 

f 1 A ■ E(l A \Kr(T)) ■ E(l A \Kr(S)) dfx. 

By Lemma 17.41 the last integral is greater or equal than /i(^4) 3 , completing the proof. □ 
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